RSKC: An R Package for a Robust and Sparse K-Means Clustering Algorithm. Witten and Tibshirani (2010) proposed an algorithim to simultaneously find clusters and select clustering variables, called sparse K-means (SK-means). SK-means is particularly useful when the dataset has a large fraction of noise variables (that is, variables without useful information to separate the clusters). SK-means works very well on clean and complete data but cannot handle outliers nor missing data. To remedy these problems we introduce a new robust and sparse K-means clustering algorithm implemented in the R package RSKC. We demonstrate the use of our package on four datasets. We also conduct a Monte Carlo study to compare the performances of RSK-means and SK-means regarding the selection of important variables and identification of clusters. Our simulation study shows that RSK-means performs well on clean data and better than SK-means and other competitors on outlier-contaminated data.
Keywords for this software
References in zbMATH (referenced in 3 articles , 1 standard article )
Showing results 1 to 3 of 3.
- Vouros, Avgoustinos; Langdell, Stephen; Croucher, Mike; Vasilaki, Eleni: An empirical comparison between stochastic and deterministic centroid initialisation for K-means variations (2021)
- Brodinová, Šárka; Filzmoser, Peter; Ortner, Thomas; Breiteneder, Christian; Rohm, Maia: Robust and sparse (k)-means clustering for high-dimensional data (2019)
- Yumi Kondo and Matias Salibian-Barrera and Ruben Zamar: RSKC: An R Package for a Robust and Sparse K-Means Clustering Algorithm (2016) not zbMATH