Description
I traced YOLOv2 C code last few days, I think there is a misunderstanding about 'mask' and 'scale'.
In this pytorch repo, the mask is used for loss function. It helps the network to focus on correct anchor boxes, instead of punishing other irrelevant boxes.
self.iou_loss = nn.MSELoss(size_average=False)(iou_pred * iou_mask, _ious * iou_mask) / num_boxes
So how to calculate right scale_mask ?
YOLO's mask is based on predicted objectness(0~1) for the box
So, if the box's predicted objectness is high (e.g. 0.9). But there are no ground-truth in that position. It should be punished. The punishment = noobject_scale * (0 - predicted objectness)
l.delta[obj_index] = l.noobject_scale * (0 - l.output[obj_index]);
Hence, this function help network learns to give reasonable confidence on the box
However, in this repo
_iou_mask[best_ious <= cfg.iou_thresh] = cfg.noobject_scale
dose not consider objectness. It punishes every unqualified box with the same value. Hence the detector learn very poor about objectness
Here is the most obvious one, other 'mask' and 'scale' are also implemented wrong way. And acutally YOLO has more complicated policy about these scale_mask. (some if-else conditions). I also find that YOLO's the loss is calculated before 'exp() and log(), not after.
By fixing scale_mask bug, VOC07 test mAP (trained on VOC07+12 trainval) increases from 0.67 to 0.71. Which is much closer to yolo-voc-weights.h5 (0.7221)
You can refer to my code darknet_v2.py. Though I am still debugging, not completed yet. Just for pointing out what I found.