SMOTE: Synthetic Minority Over-sampling Technique. An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ”normal” examples with only a small percentage of ”abnormal” or ”interesting” examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

References in zbMATH (referenced in 125 articles , 1 standard article )

Showing results 1 to 20 of 125.
Sorted by year (citations)

1 2 3 ... 5 6 7 next

  1. Gubela, Robin M.; Lessmann, Stefan; Jaroszewicz, Szymon: Response transformation and profit decomposition for revenue uplift modeling (2020)
  2. Halbersberg, Dan; Wienreb, Maydan; Lerner, Boaz: Joint maximization of accuracy and information for learning the structure of a Bayesian network classifier (2020)
  3. Mahajan, Pravar Dilip; Maurya, Abhinav; Megahed, Aly; Elwany, Alaa; Strong, Ray; Blomberg, Jeanette: Optimizing predictive precision in imbalanced datasets for actionable revenue change prediction (2020)
  4. Ruehle, Fabian: Data science applications to string theory (2020)
  5. Sun, Hongwei; Cui, Yuehua; Gao, Qian; Wang, Tong: Trimmed LASSO regression estimator for binary response data (2020)
  6. Wu, Di; Zhang, Jiangjiang; Geng, Shaojin; Cai, Xingjuan; Zhang, Guoyou: A multi-objective bat algorithm for software defect prediction (2020)
  7. Xie, Jinhan; Hao, Meiling; Liu, Wenxin; Lin, Yuanyuan: Fused variable screening for massive imbalanced data (2020)
  8. Ahmad, Jamal; Hayat, Maqsood: MFSC: multi-voting based feature selection for classification of Golgi proteins by adopting the general form of Chou’s PseAAC components (2019)
  9. Jia, Jianhua; Li, Xiaoyan; Qiu, Wangren; Xiao, Xuan; Chou, Kuo-Chen: iPPI-PseAAC(CGR): identify protein-protein interactions by incorporating chaos game representation into PseAAC (2019)
  10. Kocheturov, Anton; Pardalos, Panos M.; Karakitsiou, Athanasia: Massive datasets and machine learning for computational biomedicine: trends and challenges (2019)
  11. Lai, Chun Sing; Tao, Yingshan; Xu, Fangyuan; Ng, Wing W. Y.; Jia, Youwei; Yuan, Haoliang; Huang, Chao; Lai, Loi Lei; Xu, Zhao; Locatelli, Giorgio: A robust correlation analysis framework for imbalanced and dichotomous data with uncertainty (2019)
  12. Mouratidis, Despoina; Kermanidis, Katia Lida: Ensemble and deep learning for language-independent automatic selection of parallel data (2019)
  13. Park, Soyoung; Carriquiry, Alicia: Learning algorithms to evaluate forensic glass evidence (2019)
  14. Poterie, A.; Dupuy, J.-F.; Monbet, V.; Rouvière, L.: Classification tree algorithm for grouped variables (2019)
  15. Razzaghi, Talayeh; Safro, Ilya; Ewing, Joseph; Sadrfaridpour, Ehsan; Scott, John D.: Predictive models for bariatric surgery risks with imbalanced medical datasets (2019)
  16. Xie, Wenhao; Liang, Gongqian; Dong, Zhonghui; Tan, Baoyu; Zhang, Baosheng: An improved oversampling algorithm based on the samples’ selection strategy for classifying imbalanced data (2019)
  17. Yan, Yuan Ting; Wu, Zeng Bao; Du, Xiu Quan; Chen, Jie; Zhao, Shu; Zhang, Yan Ping: A three-way decision ensemble method for imbalanced data oversampling (2019)
  18. Zarei, Shaho; Mohammadpour, Adel: Using synthetic data and dimensionality reduction in high-dimensional classification via logistic regression (2019)
  19. Zhang, Xueying; Li, Ruixian; Zhang, Bo; Yang, Yunxiang; Guo, Jing; Ji, Xiang: An instance-based learning recommendation algorithm of imbalance handling methods (2019)
  20. Bellinger, Colin; Drummond, Christopher; Japkowicz, Nathalie: Manifold-based synthetic oversampling with manifold conformance estimation (2018)

1 2 3 ... 5 6 7 next