Recognition of reinforcement quantity based on Python

biyezuopinvip 2022-06-23 18:22:10 阅读数:483


Resource download address :
Resource download address :

Improve accuracy

  • This part mainly from data 、 A priori box (anchor box)、 Model We should start from three aspects .

Data improvement

Geometric data enhancement

  • We know that the training data this time is only 250 Zhang , So data enhancement is essential , After looking at the test set, I feel that ordinary geometric enhancement is enough , Random horizontal flip , Random cutting , Random rotation is not enough .

mix-up enhance

  • One point to be mentioned here is that some difficult samples are difficult to identify when analyzing errors later , tried mix-up The enhanced approach starts with data and improves ,mix-up Simply put, it is the weighted sum of pictures , Look at the picture below :
     Insert picture description here

  • But in this scenario use mix-up Because the background of the whole scene is more complex , The superposition of two complex images makes a lot of effective information can not be well expressed , The performance of the model has not improved

Cramming enhances

  • This is called before looking at all kinds of kaggle When the big guy shared the plan, he saw , The specific principle is to set some goals ( In particular, the effect of later analysis and detection is poor ) Buckle it out , Put it on a map with no goals , Increase image robustness . When analyzing the error later, I found that some stones on the steel bar were misjudged as steel bar , I think that the use of cramming should improve this problem , But because of the time problem and the feeling that this method is too manual, I didn't try it , The feeling should be effective .

A priori box (anchor box) improvement

  • We know YOLO Is based on anchor box To predict the offset , that anchor box Of size It's very important. , Let's visualize the length and width of the steel frame ( After normalization ):
     Insert picture description here

  • We can see that it's basically 1:1, Let's see YOLO v3 Of anchor box:

10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326

  • both 1:1 Also have 1:2, But does that mean we don't need to 1:2? So I started by changing them a little bit to 1:1( Manual ), It is found that the general detection effect of steel bar is indeed better , But a lot of shielding steel ( Not 1:1 The proportion ) The detection effect is worse , So it's not feasible to modify the scale manually .
  • So if we use kmeans To cluster and generate our anchor box Well ? Let's visualize ( This is the same as the original 9 Cluster centers ):
  • Feeling 9 Clustering is a bit of a force , But at the same time mean iou You can achieve 0.8793, In fact, it might 6 Cluster centers and even 3 Cluster centers are enough , But in the experiment, we found that after using clustering anchor box The effect is better than that of the original coco It is obtained by upper clustering anchor box The effect is poor , Here's a guess muti_scale Multi scale training has something to do with , However, due to the detection effect of the original frame is also good , So I gave up the modification anchor box Ideas , If you have a friend who knows, you still want to give me some advice .

Model improvement

FPN(feature pyramid networks)

 Insert picture description here

  • We know FPN( Characteristic pyramid ) It can be realized through the cross layer of feature graph concat This kind of operation , To make every layer of prediction use feature map They all integrate semantic features of different depths , Enhance the detection effect . stay YOLO v3 The feature layer of two adjacent layers is used for fusion , I found that a lot of the edges of the reinforcement were not well detected , One of my guesses is that the feature fusion is not good enough , So I will 52×52 The prediction branch of ( The branch of the three prediction branches corresponding to the detection of small objects ) A great fusion of features , Manually change it to 3 Fusion of prediction branches , After the experiment, it was found that the effect was improved , But the whole thing is unstable , I changed the code back later , But later I found that the result of that submission was the highest ==
  • There is another point here is that in this scenario, it only detects multiple targets and single category , As the only category, reinforcement does not have strong semantic information , So I don't think it needs to be like YOLO The same deep FPN, Here we can consider putting the deepest FPN For a shallower one FPN To merge , The effect should be improved .

Cavity convolution

  • As mentioned above, many steel bars at the edge of the picture are not well detected , In the following summary, I think it is similar to darknet The output receptive field (ROI) Less about , I think it can be done in darknet The last layer adds a hole convolution to expand the receptive field , Hole convolution is simply to increase the receptive field by increasing the computational range , This will be followed by an article to introduce , I didn't try this because it was a comparison , Readers can try .
     Insert picture description here

Multiscale training

  • We use multi-scale image input to train the model , To make the model robust to scale , Here's a point , If it is through each input image to randomly select the scale input ( namely YunYang The way in the code ) Multi scale training , In training loss It's easy to see nan, To avoid this problem, you can do it in every batch Between randomly selected scales instead of each batch Choose the scale at random .

Background error check

  • There are many wrong background samples in the test samples , It's natural for us to think of Focal loss, We know Focal loss There are two parameters α γ \alpha \gamma αγ, among γ \gamma γ Fixed for 2 Don't tune , Mainly adjustment α \alpha α, But in the experiment, I found that no matter how to adjust this parameter , At the end of the training, although the speed of convergence is much faster , But the results of the tests have not improved , This sum YOLO The author said in the paper that Focal loss No work equally , It took me a long time to understand : We know YOLO There are three ways to judge objects : Example , Negative examples and neglect , And ground truth Of iou exceed 0.5 It's supposed to be ignore, We use it YOLO v1 To illustrate :
     Insert picture description here

  • We know that the whole red box is the dog , But the grid in the red box is not all the same as the dog's gound truth Of IOU exceed 0.5, This makes some samples that should have been ignored become negative examples , Join us Focal Loss Of “ Difficult sample set ”, It's caused label noise, Let's use Focal loss After that, the performance of the model became worse . It's easy to solve this , take ignore The threshold is from 0.5 Transfer to 0.1 or 0.2, Performance immediately improved .( It reduces a lot of background errors , But because the threshold is lowered, there are many more boxes , This can be eliminated by controlling the score threshold later , Because those boxes are basically low sub frames )


  • Many people will think of using this dense target detection soft-nms replace nms, This was explained earlier YOLO v3 The article also mentioned , Specifically, don't delete everything directly IoU Boxes larger than threshold , Because in the detection of dense objects, there will be false deletion , It's about reducing confidence .
  • I use Gaussian weighting soft-nms Instead of ordinary nms, But take the parameters σ \sigma σ from 0.1 try 0.7, The effect is very bad , Let's take a look at the renderings :
     Insert picture description here

Resource download address :
Resource download address :