Multi-Attention Mechanism Fusion for Fine-Grained Image Classification

Rong Du, Dongmei Ma


In recent years, image classification has developed to the fine-grained level, which has become a new research hotspot. Compared with the traditional image classification task, the fine-grained image classification task has small difficulties due to the influence of image shooting scenes. So focus on mechanism has been widely used in fine-grained image classification problems, but the traditional attention focus on mechanism has the characteristics of first positioning and after processing, the model needs to run step by step and the attention focus on method is single. To further improve the performance of deep convolutional neural networks on the fine-grained image classification task, this paper studies the end-to-end weakly supervised fine-grained image classification model with multiple attention mechanism fusion. In this paper, a fine-grained image deep convolutional network model embedded with four attention focusing mechanisms, they are including: class activation mapping CAM attention focusing method, channel attention CA focusing method, spatial attention SA focusing method and channel spatial confusion attention and CSCA focusing method. On the fine-grained image classification dataset CUB-200-2011, Stanford-dogs, Stanford-cars, the results show that the four attention focusing methods can focus on local features and improve the performance of convolutional network classification performance, among which the channel spatial confusion attention focus method is the most significant improvement in the model classification performance.


Fine-Grained Image Classification; Convolutional Neural Network; Attention Focus; Multi-Scale Learning

Full Text:


Included Database


Berg T, Liu J, Lee S W, et al. Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds[A]. Computer Vision and Pattern Recognition[C]. Piscataway, NJ : IEEE, 2014: 2019-2026.

Akata Z, Reed S, Walter D J, et al. Evaluation of output embeddings for fine-grained image classification[A]. Computer Vision and Pattern Recognition[C]. Piscataway, NJ : IEEE, 2015: 2927-2936.

Wah C, Branson S, Welinder P, Perona P, Belongie S. The Caltech-UCSD Birds-200-2011 Dataset. [DB/OL]. (2011). Available from: caltech. edu/visipedia/CUB-200-2011.html.

Lin Z, Mu S, Huang F, et al. A Unified Matrix-Based Convolutional Neural Network for Fine Grained Image Classification of Wheat Leaf Diseases[J]. IEEE Access, 2019: 11570-11590.

Sun Z, Yao Y, Wei X S, et al. Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 10602-10611.

Zhang L, Huang S, Liu W. Intra-class Part Swapping for Fine-Grained Image Classification[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021: 3209-3218.

Wang Y, Morariu VI, Davis LS. Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018.



  • There are currently no refbacks.

Copyright (c) 2022 Rong Du

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.