Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Motivation: In 2001 and 2002, we published two papers (Bioinformatics, 17, 282–283, Bioinformatics, 18, 77–82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST. Availability:

References in zbMATH (referenced in 46 articles )

Showing results 1 to 20 of 46.
Sorted by year (citations)

1 2 3 next

  1. Chen, Guodong; Cao, Man; Yu, Jialin; Guo, Xinyun; Shi, Shaoping: Prediction and functional analysis of prokaryote lysine acetylation site by incorporating six types of features into Chou’s general PseAAC (2019)
  2. Ge, Li; Liu, Jiaguo; Zhang, Yusen; Dehmer, Matthias: Identifying anticancer peptides by using a generalized chaos game representation (2019)
  3. Monod, Anthea; Kališnik, Sara; Patiño-Galindo, Juan Ángel; Crawford, Lorin: Tropical sufficient statistics for persistent homology (2019)
  4. Pan, Yi; Wang, Shiyuan; Zhang, Qi; Lu, Qianzi; Su, Dongqing; Zuo, Yongchun; Yang, Lei: Analysis and prediction of animal toxins by various Chou’s pseudo components and reduced amino acid compositions (2019)
  5. Sahlin, Kristoffer; Medvedev, Paul: \textitDenovo clustering of long-read transcriptome data using a greedy, quality-value based algorithm (2019)
  6. Srivastava, Abhishikha; Kumar, Ravindra; Kumar, Manish: BlaPred: predicting and classifying (\beta)-lactamase using a 3-tier prediction system via Chou’s general PseAAC (2018)
  7. Brubach, Brian; Ghurye, Jay; Pop, Mihai; Srinivasan, Aravind: Better greedy sequence clustering with fast banded alignment (2017)
  8. Dehzangi, Abdollah; López, Yosvany; Lal, Sunil Pranit; Taherzadeh, Ghazaleh; Michaelson, Jacob; Sattar, Abdul; Tsunoda, Tatsuhiko; Sharma, Alok: PSSM-Suc: accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction (2017)
  9. Keith, Jonathan M. (ed.): Bioinformatics. Volume I. Data, sequence analysis, and evolution (2017)
  10. Pai, Priyadarshini P.; Dash, Tirtharaj; Mondal, Sukanta: Sequence-based discrimination of protein-RNA interacting residues using a probabilistic approach (2017)
  11. Carugo, Oliviero (ed.); Eisenhaber, Frank (ed.): Data mining techniques for the life sciences (2016)
  12. Ali, Farman; Hayat, Maqsood: Classification of membrane protein types using voting feature interval in combination with Chou’s pseudo amino acid composition (2015)
  13. Arango-Argoty, G. A.; Jaramillo-Garzón, J. A.; Castellanos-Domínguez, G.: Feature extraction by statistical contact potentials and wavelet transform for predicting subcellular localizations in gram negative bacterial proteins (2015)
  14. Kumar, Ravindra; Srivastava, Abhishikha; Kumari, Bandana; Kumar, Manish: Prediction of (\beta)-lactamase and its class by Chou’s pseudo-amino acid composition and support vector machine (2015)
  15. Zhao, Xiaowei; Ning, Qiao; Chai, Haiting; Ma, Zhiqiang: Accurate in silico identification of protein succinylation sites using an iterative semi-supervised learning technique (2015)
  16. Bakhtiarizadeh, Mohammad Reza; Moradi-Shahrbabak, Mohammad; Ebrahimi, Mansour; Ebrahimie, Esmaeil: Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology (2014)
  17. Niu, Xiao-Hui; Hu, Xue-Hai; Shi, Feng; Xia, Jing-Bo: Predicting DNA binding proteins using support vector machine with hybrid fractal features (2014)
  18. Yang, Lei; Wang, Jizhe; Wang, Huiping; Lv, Yingli; Zuo, Yongchun; Jiang, Wei: Analysis and identification of toxin targets by topological properties in protein-protein interaction network (2014)
  19. Chen, Yen-Kuang; Li, Kuo-Bin: Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou’s pseudo amino acid composition (2013)
  20. Feng, Peng-Mian; Ding, Hui; Chen, Wei; Lin, Hao: Naïve Bayes classifier with feature selection to identify phage virion proteins (2013)

1 2 3 next