Improved Sampling and Feature Selection to Support Extreme Gradient Boosting For PCOS Diagnosis

Abstract

PolyCystic Ovary Syndrome (PCOS) is one of the most common causes of female infertility, affecting a large number of women of reproductive age, even continuing far beyond the childbearing years. This hormonal disorder may further lead to the risk of other long-term complications. Considering the powerful recognition abilities of the probabilistic nature of ensemble-based gradient boosting algorithms, particularly in the field of the medical domain, we propose the use of Extreme Gradient Boosting, XGBoost, for early detection of PCOS. To strongly support an effective classification performance, we have resampled our data using a combination of SMOTE(Synthetic Minority Oversampling Techniques) & ENN (Edited Nearest Neighbour), to solve class imbalance and data outliers issues. Also, by exploiting popular statistical correlation methods, ANOVA Test Chi-Square Test, we have identified 23 most significant metabolic and clinical parameters that best classify PCOS conditions. Finally, we experimented with our model on a benchmark dataset collected from Kaggle to justify the effectiveness of our proposed findings where the Extreme Gradient Boosting classifier outperformed all other classifiers with a 10 Fold Cross-validation score of 96.03 % all over, along with a 98% Recall in the detection of patients not having PCOS, which outperforms all the existing recent methods where the numerical data-driven diagnosis of PCOS have been studied on this particular dataset.

Publication
IEEE Annual Computing and Communication Workshop and Conference (CCWC)
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Create your slides in Markdown - click the Slides button to check out the example.

Supplementary notes can be added here, including code, math, and images.

Rizwan Hasan
Rizwan Hasan
Software Engineer

Software Engineer