Class Imbalanced NIPT Data Processing based on Ensemble Learning: Focusing on Early Detection of Chromosomal Abnormalities in Female Fetuses

Authors

  • Lu Kun
  • Gaojun Jin
  • Jiaming Huang

DOI:

https://doi.org/10.54691/3vn3gv19

Keywords:

Class imbalance, ensemble learning, RUSBoost, non-invasive prenatal testing, recall rate.

Abstract

To address class imbalance in non-invasive prenatal testing (NIPT) data for pregnant women, improve the accuracy of early prenatal detection for female fetuses, and reduce the risk of shortened treatment windows, this paper comprehensively compares the classification performance of four ensemble learning models-EasyEnsemble, RUSBoost, AdaBoost, and BalancedRandomForest-for chromosomal aneuploidy labeling. This aims to determine the optimal approach for handling imbalanced data. First, the dataset underwent preprocessing: irrelevant feature columns were removed, categorical data was converted using label encoding, and the dataset was split into training (80%) and testing (20%) sets. Next, the trained models were applied to the testing set to generate predictions, which were then compared against the ground truth labels. Finally, classification results were visualized using confusion matrices and other charts, while classification performance metrics such as accuracy, precision, and F1 score were calculated to compare the classification effectiveness of the four models from different perspectives. Experimental results indicate that the four models exhibit differences in classifying chromosomal aneuploidy labels. Overall, the RUSBoost model demonstrates optimal classification performance with an accuracy rate of 87.91% and good recall, showcasing strong robustness and generalization capabilities. This provides a practical solution for early detection of female fetal abnormalities in clinical settings.

Downloads

Download data is not yet available.

References

[1] Kotsopoulou I, Tsoplou P, Mavrommatis K, et al. Non-invasive prenatal testing (NIPT): limitations on the way to become diagnosis[J]. Diagnosis, 2015, 2(3): 141-158.

[2] ayashankar S S, Nasaruddin M L, Hassan M F, et al. Non-invasive prenatal testing (NIPT): reliability, challenges, and future directions[J]. Diagnostics, 2023, 13(15): 2570.

[3] Salmi M, Atif D, Oliva D, et al. Handling imbalanced medical datasets: review of a decade of research[J]. Artificial intelligence review, 2024, 57(10): 273.

[4] He H, Garcia E A. Learning from imbalanced data[J]. IEEE Transactions on knowledge and data engineering, 2009, 21(9): 1263-1284.

[5] Mohammed R, Rawashdeh J, Abdullah M. Machine learning with oversampling and undersampling techniques: overview study and experimental results[C]//2020 11th international conference on information and communication systems (ICICS). IEEE, 2020: 243-248.

[6] Dong X, Yu Z, Cao W, et al. A survey on ensemble learning[J]. Frontiers of Computer Science, 2020, 14(2): 241-258.

[7] Hasanin T, Khoshgoftaar T. The effects of random undersampling with simulated class imbalance for big data[C]//2018 IEEE international conference on information reuse and integration (IRI). IEEE, 2018: 70-79.

[8] Mani I, Zhang I. kNN approach to unbalanced data distributions: a case study involving information extraction[C]//Proceedings of workshop on learning from imbalanced datasets. United States: ICML, 2003, 126(1): 1-7.

[9] He H, Bai Y, Garcia E A, et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning[C]//2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). Ieee, 2008: 1322-1328.

[10] Liu T Y. Easyensemble and feature selection for imbalance data sets[C]//2009 international joint conference on bioinformatics, systems biology and intelligent computing. IEEE, 2009: 517-520.

[11] Raghuwanshi B S, Shukla S. Classifying imbalanced data using BalanceCascade-based kernelized extreme learning machine[J]. Pattern Analysis and Applications, 2020, 23(3): 1157-1182.

[12] Chawla N V, Lazarevic A, Hall L O, et al. SMOTEBoost: Improving prediction of the minority class in boosting[C]//European conference on principles of data mining and knowledge discovery. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003: 107-119.

[13] Seiffert C, Khoshgoftaar T M, Van Hulse J, et al. RUSBoost: A hybrid approach to alleviating class imbalance[J]. IEEE transactions on systems, man, and cybernetics-part A: systems and humans, 2009, 40(1): 185-197.

Downloads

Published

2026-03-31

Issue

Section

Articles

How to Cite

Kun, Lu, Gaojun Jin, and Jiaming Huang. 2026. “Class Imbalanced NIPT Data Processing Based on Ensemble Learning: Focusing on Early Detection of Chromosomal Abnormalities in Female Fetuses”. Scientific Journal of Intelligent Systems Research 8 (3): 62-71. https://doi.org/10.54691/3vn3gv19.