SMOTE: Synthetic Minority Over-sampling Technique. An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ”normal” examples with only a small percentage of ”abnormal” or ”interesting” examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

References in zbMATH (referenced in 163 articles , 1 standard article )

Showing results 1 to 20 of 163.
Sorted by year (citations)

1 2 3 ... 7 8 9 next

  1. Imoussaten, Abdelhak; Jacquin, Lucie: Cautious classification based on belief functions theory and imprecise relabelling (2022)
  2. Johnson, Marina; Albizri, Abdullah; Simsek, Serhat: Artificial intelligence in healthcare operations to enhance treatment outcomes: a framework to predict lung cancer prognosis (2022)
  3. Liu, Xin; He, Wenqing: Adaptive kernel scaling support vector machine with application to a prostate cancer image study (2022)
  4. Li, Yanling; Oravecz, Zita; Zhou, Shuai; Bodovski, Yosef; Barnett, Ian J.; Chi, Guangqing; Zhou, Yuan; Friedman, Naomi P.; Vrieze, Scott I.; Chow, Sy-Miin: Bayesian forecasting with a regime-switching zero-inflated multilevel Poisson regression model: an application to adolescent alcohol use with spatial covariates (2022)
  5. Loynes, Christopher; Ouenniche, Jamal; De Smedt, Johannes: The detection and location estimation of disasters using Twitter and the identification of non-governmental organisations using crowdsourcing (2022)
  6. Mitra, Sinjini; Le, Kenny: The effect of cognitive and behavioral factors on student success in a bottleneck business statistics course via deeper analytics (2022)
  7. Mojiri, Arezou; Khalili, Abbas; Hamadani, Ali Zeinal: New hard-thresholding rules based on data splitting in high-dimensional imbalanced classification (2022)
  8. Quesnel, Frédéric; Wu, Alice; Desaulniers, Guy; Soumis, François: Deep-learning-based partial pricing in a branch-and-price algorithm for personalized crew rostering (2022)
  9. Seitshiro, M. B.; Mashele, H. P.: Quantification of model risk that is caused by model misspecification (2022)
  10. Welchowski, Thomas; Maloney, Kelly O.; Mitchell, Richard; Schmid, Matthias: Techniques to improve ecological interpretability of black-box machine learning models. Case study on biological health of streams in the United States with gradient boosted trees (2022)
  11. Yuxiao Huang, Yan Ma: CIGAN: A Python Package for Handling Class Imbalance using Generative Adversarial Networks (2022) arXiv
  12. Akalin, Altuna: Computational genomics with R. With the assistance of Verdan Franke, Bora Uyar and Jonathan Ronen (2021)
  13. Aminian, Ehsan; Ribeiro, Rita P.; Gama, João: Chebyshev approaches for imbalanced data streams regression models (2021)
  14. Barella, Victor H.; Garcia, Luís P. F.; de Souto, Marcilio C. P.; Lorena, Ana C.; de Carvalho, André C. P. L. F.: Assessing the data complexity of imbalanced datasets (2021)
  15. Bej, Saptarshi; Davtyan, Narek; Wolfien, Markus; Nassar, Mariam; Wolkenhauer, Olaf: LoRAS: an oversampling approach for imbalanced datasets (2021)
  16. Bernardo, Alessio; Della Valle, Emanuele: VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams (2021)
  17. Bulavas, Viktoras; Marcinkevičius, Virginijus; Rumiński, Jacek: Study of multi-class classification algorithms’ performance on highly imbalanced network intrusion datasets (2021)
  18. Cao, Yi; Liu, Xiaoquan; Zhai, Jia: Option valuation under no-arbitrage constraints with neural networks (2021)
  19. Chen, Baiyun; Xia, Shuyin; Chen, Zizhong; Wang, Binggui; Wang, Guoyin: RSMOTE: a self-adaptive robust SMOTE for imbalanced problems with label noise (2021)
  20. Chen, Zhi; Duan, Jiang; Kang, Li; Qiu, Guoping: A hybrid data-level ensemble to enable learning from highly imbalanced dataset (2021)

1 2 3 ... 7 8 9 next