Rxivist logo

Breast Cancer Histopathological Image Classification: A Deep Learning Approach

By Mehdi Habibzadeh Motlagh, Mahboobeh Jannesari, HamidReza Aboulkheyr, Pegah Khosravi, Olivier Elemento, Mehdi Totonchi, Iman Hajirasouliha

Posted 04 Jan 2018
bioRxiv DOI: 10.1101/242818

Breast cancer remains the most common type of cancer and the leading cause of cancer-induced mortality among women with 2.4 million new cases diagnosed and 523,000 deaths per year. Historically, a diagnosis has been initially performed using clinical screening followed by histopathological analysis. Automated classification of cancers using histopathological images is a chciteallenging task of accurate detection of tumor sub-types. This process could be facilitated by machine learning approaches, which may be more reliable and economical compared to conventional methods. To prove this principle, we applied fine-tuned pre-trained deep neural networks. To test the approach we first classify different cancer types using 6,402 tissue microarrays (TMAs) training samples. Our framework accurately detected on average 99.8% of the four cancer types including breast, bladder, lung and lymphoma using the ResNet V1 50 pre-trained model. Then, for classification of breast cancer sub-types, this approach was applied to 7,909 images from the BreakHis database. In the next step, ResNet V1 152 classified benign and malignant breast cancers with an accuracy of 98.7%. In addition, ResNet V1 50 and ResNet V1 152 categorized either benign- (adenosis, fibroadenoma, phyllodes tumor, and tubular adenoma) or malignant- (ductal carcinoma, lobular carcinoma, mucinous carcinoma, and papillary carcinoma) sub-types with 94.8% and 96.4% accuracy, respectively. The confusion matrices revealed high sensitivity values of 1, 0.995 and 0.993 for cancer types, as well as malignant- and benign sub-types respectively. The areas under the curve (AUC) scores were 0.996,0.973 and 0.996 for cancer types, malignant and benign sub-types, respectively. Overall, our results show negligible false negative (on average 3.7 samples) and false positive (on average 2 samples) results among different models. Availability: Source codes, guidelines, and data sets are temporarily available on google drive upon request before moving to a permanent GitHub repository.

Download data

  • Downloaded 7,168 times
  • Download rankings, all-time:
    • Site-wide: 1,055
    • In bioinformatics: 72
  • Year to date:
    • Site-wide: None
  • Since beginning of last month:
    • Site-wide: 19,390

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)