Active Learning for Object Detection on Website Pages

Anders Seth Ward
MAS, 2023
Wu, Yingnian
In this paper we conduct transfer learning on the Faster R-CNN ResNet-50 FPN V2 object detection model from Benchmarking Detection Transfer Learning with Vision Transformers [12] to a new problem space: detecting what parts of a webpage can be clicked on. The model predicts the bounding boxes and class labels for what part of a website image can be clicked on, and what the result will be. We found that training the feature extraction layers of the model was detrimental to the learning (354% higher loss after 10 epochs), but that training the RPN layers was not detrimental. We additionally found that the most effective resizing for website imagery was to 224 x 224 pixels. Since the meta trainer we developed did not produce usable data, we experiment on using the data diversity approach in active learning to select the most representative images to train on. Compared to our control, the model trained on the active learning dataset scored 12.5% better on the validation data. However, with such a small dataset of 10 labeled images, the model was unable to generalize and did not predict any of the 15 objects on a labeled image outside the training set.
2023