A Hybrid Probabilistic Ensemble based Extreme Gradient Boosting Approach For Breast Cancer Diagnosis

Abstract

Breast cancer has been identified as one of the most common invasive cancers and the second leading cause of cancer death among women. The survival rates have, however, improved dramatically in recent years, thanks to the advances in the screening and treatment process, hugely depending on how early the disease was detected. Along with the physicians, this had also initiated researchers all over the globe to dedicate themselves to extensive research to produce automated diagnosis strategies for breast cancer. Realizing the extraordinary potential of machine learning-based models in the biomedical domain, a large number of diagnosis methods have been proposed in this direction. In our study, we propose a hybrid unique machine learning framework that integrates individual prediction probabilities from 3 machine learning (Logistic Regression, Support Vector Machine, and K Nearest Neighbors) classifiers, then enhances the performance of these 3 classifiers through hybridization, stacking a gradient boosting algorithm over the combination of these classifiers which ultimately results in a 10 Fold Cross Validation Score of 98.4%, Recall of 100% and Precision of 97.3%. Besides, to handle the class imbalance problem we have incorporated SMOTE(Synthetic Minority Oversampling Technique) for minority classes and also Robust Scaling for normalization to deal with outliers in the dataset. In our proposed hybrid solution, we successfully adopted the breast cancer domain in every stage of our framework, starting from data pre-processing, feature extraction and finally classification. Our framework outperformed some recent state of the art studies in the breast cancer domain.

Publication
IEEE Annual Computing and Communication Workshop and Conference (CCWC)
Rizwan Hasan
Rizwan Hasan
Software Engineer

Software Engineer