ROCK: a robust clustering algorithm for categorical attributes. Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity measure in order to partition the database such that data points in the same partition are more similar than points in different partitions. In this paper, we study clustering algorithms for data with boolean and categorical attributes. We show that traditional clustering algorithms that use distances between points for clustering are not appropriate for boolean and categorical attributes. Instead, we propose a novel concept of links to measure the similarity/proximity between a pair of data points. We develop a robust hierarchical clustering algorithm ROCK that employs links and not distances when merging clusters. Our methods naturally extend to non-metric similarity measures that are relevant in situations where a domain expert/similarity table is the only source of knowledge. In addition to presenting detailed complexity results for ROCK, we also conduct an experimental study with real-life as well as synthetic data sets to demonstrate the effectiveness of our techniques. For data with categorical attributes, our findings indicate that ROCK not only generates better quality clusters than traditional algorithms, but it also exhibits good scalability properties.

References in zbMATH (referenced in 73 articles )

Showing results 1 to 20 of 73.
Sorted by year (citations)

1 2 3 4 next

  1. Bury, Marc; Gentili, Michele; Schwiegelshohn, Chris; Sorella, Mara: Polynomial time approximation schemes for all 1-center problems on metric rational set similarities (2021)
  2. Carlsson, Gunnar; Mémoli, Facundo; Segarra, Santiago: Robust hierarchical clustering for directed networks: an axiomatic approach (2021)
  3. Mukhachev, P. A.; Sadretdinov, T. R.; Pritykin, D. A.; Ivanov, A. B.; Solov’ev, S. V.: Modern machine learning methods for telemetry-based spacecraft health monitoring (2021)
  4. Wang, Shuliang; Li, Qi; Zhao, Chuanfeng; Zhu, Xingquan; Yuan, Hanning; Dai, Tianru: Extreme clustering -- a clustering method via density extreme points (2021)
  5. Yu, Liqin; Cao, Fuyuan; Zhao, Xingwang; Yang, Xiaodan; Liang, Jiye: Combining attribute content and label information for categorical data ensemble clustering (2020)
  6. D’Urso, Pierpaolo; Massari, Riccardo: Fuzzy clustering of mixed data (2019)
  7. Uglickich, Evženie; Nagy, Ivan; Vlčková, Dominika: Comparing clusterings using combination of the kappa statistic and entropy-based measure (2019)
  8. Amiri, Saeid; Clarke, Bertrand S.; Clarke, Jennifer L.: Clustering categorical data via ensembling dissimilarity matrices (2018)
  9. Boongoen, Tossapon; Iam-On, Natthakan: Cluster ensembles: a survey of approaches with recent extensions and applications (2018)
  10. Sangam, Ravi Sankar; Om, Hari: An equi-biased (k)-prototypes algorithm for clustering mixed-type data (2018)
  11. Huang, Jinlong; Zhu, Qingsheng; Yang, Lijun; Cheng, Dongdong; Wu, Quanwang: QCC: a novel clustering algorithm based on quasi-cluster centers (2017)
  12. Huerta-Muñoz, Diana L.; Ríos-Mercado, Roger Z.; Ruiz, Rubén: An iterated greedy heuristic for a market segmentation problem with multiple attributes (2017)
  13. Kim, Kyoungok: A weighted (k)-modes clustering using new weighting method based on within-cluster and between-cluster impurity measures (2017)
  14. Vigneron, V.; Chen, H.: A multi-scale seriation algorithm for clustering sparse imbalanced data: application to spike sorting (2016)
  15. Dai, Hanbo; Zhu, Feida; Lim, Ee-Peng; Pang, HweeHwa: Detecting anomaly collections using extreme feature ranks (2015)
  16. Khalid, Shehzad; Razzaq, Shahid: TOBAE: a density-based agglomerative clustering algorithm (2015)
  17. Noorbehbahani, Fakhroddin; Mousavi, Sayyed; Mirzaei, Abdolreza: An incremental mixed data clustering method using a new distance measure (2015) ioport
  18. Kang, Pilsung; Kim, Dongil; Cho, Sungzoon: Evaluating the reliability level of virtual metrology results for flexible process control: a novelty detection-based approach (2014) ioport
  19. Lin, Kawuu W.; Lin, Chun-Hung; Hsiao, Chun-Yuan: A parallel and scalable CAST-based clustering algorithm on GPU (2014) ioport
  20. Saha, Indrajit; Maulik, Ujjwal: Incremental learning based multiobjective fuzzy clustering for categorical data (2014) ioport

1 2 3 4 next