SMOTE
SMOTE: Synthetic Minority Over-sampling Technique. An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ”normal” examples with only a small percentage of ”abnormal” or ”interesting” examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Keywords for this software
References in zbMATH (referenced in 163 articles , 1 standard article )
Showing results 1 to 20 of 163.
Sorted by year (- Imoussaten, Abdelhak; Jacquin, Lucie: Cautious classification based on belief functions theory and imprecise relabelling (2022)
- Johnson, Marina; Albizri, Abdullah; Simsek, Serhat: Artificial intelligence in healthcare operations to enhance treatment outcomes: a framework to predict lung cancer prognosis (2022)
- Liu, Xin; He, Wenqing: Adaptive kernel scaling support vector machine with application to a prostate cancer image study (2022)
- Li, Yanling; Oravecz, Zita; Zhou, Shuai; Bodovski, Yosef; Barnett, Ian J.; Chi, Guangqing; Zhou, Yuan; Friedman, Naomi P.; Vrieze, Scott I.; Chow, Sy-Miin: Bayesian forecasting with a regime-switching zero-inflated multilevel Poisson regression model: an application to adolescent alcohol use with spatial covariates (2022)
- Loynes, Christopher; Ouenniche, Jamal; De Smedt, Johannes: The detection and location estimation of disasters using Twitter and the identification of non-governmental organisations using crowdsourcing (2022)
- Mitra, Sinjini; Le, Kenny: The effect of cognitive and behavioral factors on student success in a bottleneck business statistics course via deeper analytics (2022)
- Mojiri, Arezou; Khalili, Abbas; Hamadani, Ali Zeinal: New hard-thresholding rules based on data splitting in high-dimensional imbalanced classification (2022)
- Quesnel, Frédéric; Wu, Alice; Desaulniers, Guy; Soumis, François: Deep-learning-based partial pricing in a branch-and-price algorithm for personalized crew rostering (2022)
- Seitshiro, M. B.; Mashele, H. P.: Quantification of model risk that is caused by model misspecification (2022)
- Welchowski, Thomas; Maloney, Kelly O.; Mitchell, Richard; Schmid, Matthias: Techniques to improve ecological interpretability of black-box machine learning models. Case study on biological health of streams in the United States with gradient boosted trees (2022)
- Yuxiao Huang, Yan Ma: CIGAN: A Python Package for Handling Class Imbalance using Generative Adversarial Networks (2022) arXiv
- Akalin, Altuna: Computational genomics with R. With the assistance of Verdan Franke, Bora Uyar and Jonathan Ronen (2021)
- Aminian, Ehsan; Ribeiro, Rita P.; Gama, João: Chebyshev approaches for imbalanced data streams regression models (2021)
- Barella, Victor H.; Garcia, Luís P. F.; de Souto, Marcilio C. P.; Lorena, Ana C.; de Carvalho, André C. P. L. F.: Assessing the data complexity of imbalanced datasets (2021)
- Bej, Saptarshi; Davtyan, Narek; Wolfien, Markus; Nassar, Mariam; Wolkenhauer, Olaf: LoRAS: an oversampling approach for imbalanced datasets (2021)
- Bernardo, Alessio; Della Valle, Emanuele: VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams (2021)
- Bulavas, Viktoras; Marcinkevičius, Virginijus; Rumiński, Jacek: Study of multi-class classification algorithms’ performance on highly imbalanced network intrusion datasets (2021)
- Cao, Yi; Liu, Xiaoquan; Zhai, Jia: Option valuation under no-arbitrage constraints with neural networks (2021)
- Chen, Baiyun; Xia, Shuyin; Chen, Zizhong; Wang, Binggui; Wang, Guoyin: RSMOTE: a self-adaptive robust SMOTE for imbalanced problems with label noise (2021)
- Chen, Zhi; Duan, Jiang; Kang, Li; Qiu, Guoping: A hybrid data-level ensemble to enable learning from highly imbalanced dataset (2021)