HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. MOTIVATION: Recent studies have shown that microarray gene expression data are useful for phenotype classification of many diseases. A major problem in this classification is that the number of features (genes) greatly exceeds the number of instances (tissue samples). It has been shown that selecting a small set of informative genes can lead to improved classification accuracy. Many approaches have been proposed for this gene selection problem. Most of the previous gene ranking methods typically select 50-200 top-ranked genes and these genes are often highly correlated. Our goal is to select a small set of non-redundant marker genes that are most relevant for the classification task. RESULTS: To achieve this goal, we developed a novel hybrid approach that combines gene ranking and clustering analysis. In this approach, we first applied feature filtering algorithms to select a set of top-ranked genes, and then applied hierarchical clustering on these genes to generate a dendrogram. Finally, the dendrogram was analyzed by a sweep-line algorithm and marker genes are selected by collapsing dense clusters. Empirical study using three public datasets shows that our approach is capable of selecting relatively few marker genes while offering the same or better leave-one-out cross-validation accuracy compared with approaches that use top-ranked genes directly for classification. AVAILABILITY: The HykGene software is freely available at http://www.cs.dartmouth.edu/ wyh/software.htm

References in zbMATH (referenced in 9 articles )

Showing results 1 to 9 of 9.
Sorted by year (citations)

  1. Luo, Ziyan; Sun, Defeng; Toh, Kim-Chuan; Xiu, Naihua: Solving the OSCAR and SLOPE models using a semismooth Newton-based augmented Lagrangian method (2019)
  2. Gu, Jason (ed.); Qi, Xiaomei (ed.); Wang, Ying (ed.); Liu, Fei (ed.); Zhang, Chengjin (ed.): Editorial: Advances in methods for networked and cyber-physical system (2014)
  3. Yang, Pengyi; Zhou, Bing Bing; Zhang, Zili; Zomaya, Albert Y.: A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data (2010) ioport
  4. Dessì, Nicoletta; Pes, Barbara: An evolutionary method for combining different feature selection criteria in microarray data classification (2009) ioport
  5. Hong, Jin-Hyuk; Cho, Sung-Bae: Gene boosting for cancer classification based on gene expression profiles (2009)
  6. Li, Der-Chiang; Fang, Yao-Hwei; Lai, Yung-Yao; Hu, Susan C.: Utilization of virtual samples to facilitate cancer identification for DNA microarray data in the early stages of an investigation (2009)
  7. Yang, Tae Young: Efficient multi-class cancer diagnosis algorithm, using a global similarity pattern (2009)
  8. Li, J.; Tang, X.; Liu, J.; Huang, J.; Wang, Y.: A novel approach to feature extraction from classification models based on information gene pairs (2008)
  9. Shen, Qi; Shi, Weimin; Kong, Wei: Hybrid particle swarm optimization and tabu search approach for selecting genes for tumor classification using gene expression data (2008)