Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. With the avalanche of newly-found protein sequences emerging in the post genomic era, it is highly desirable to develop an automated method for fast and reliably identifying their subcellular locations because knowledge thus obtained can provide key clues for revealing their functions and understanding how they interact with each other in cellular networking. However, predicting subcellular location of eukaryotic proteins is a challenging problem, particularly when unknown query proteins do not have significant homology to proteins of known subcellular locations and when more locations need to be covered. To cope with the challenge, protein samples are formulated by hybridizing the information derived from the gene ontology database and amphiphilic pseudo amino acid composition. Based on such a representation, a novel ensemble hybridization classifier was developed by fusing many basic individual classifiers through a voting system. Each of these basic classifiers was engineered by the KNN (K-Nearest Neighbor) principle. As a demonstration, a new benchmark dataset was constructed that covers the following 18 localizations: (1) cell wall, (2) centriole, (3) chloroplast, (4) cyanelle, (5) cytoplasm, (6) cytoskeleton, (7) endoplasmic reticulum, (8) extracell, (9) Golgi apparatus, (10) hydrogenosome, (11) lysosome, (12) mitochondria, (13) nucleus, (14) peroxisome, (15) plasma membrane, (16) plastid, (17) spindle pole body, and (18) vacuole. To avoid the homology bias, none of the proteins included has > or =25% sequence identity to any other in a same subcellular location. The overall success rates thus obtained via the 5-fold and jackknife cross-validation tests were 81.6 and 80.3%, respectively, which were 40-50% higher than those performed by the other existing methods on the same strict dataset. The powerful predictor, named ”Euk-PLoc”, is available as a web-server at . Furthermore, to support the need of people working in the relevant areas, a downloadable file will be provided at the same website to list the results predicted by Euk-PLoc for all eukaryotic protein entries (excluding fragments) in Swiss-Prot database that do not have subcellular location annotations or are annotated as being uncertain. The large-scale results will be updated twice a year to include the new entries of eukaryotic proteins and reflect the continuous development of Euk-PLoc.

References in zbMATH (referenced in 24 articles )

Showing results 1 to 20 of 24.
Sorted by year (citations)

1 2 next

  1. Hussain, Waqar; Khan, Yaser Daanial; Rasool, Nouman; Khan, Sher Afzal; Chou, Kuo-Chen: SPrenylC-PseAAC: a sequence-based model developed via Chou’s 5-steps rule and general PseAAC for identifying S-prenylation sites in proteins (2019)
  2. Jia, Jianhua; Liu, Zi; Xiao, Xuan; Liu, Bingxiang; Chou, Kuo-Chen: pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach (2016)
  3. Jiao, Ya-Sen; Du, Pu-Feng: Predicting Golgi-resident protein types using pseudo amino acid compositions: approaches with positional specific physicochemical properties (2016)
  4. Jiao, Ya-Sen; Du, Pu-Feng: Prediction of Golgi-resident protein types using general form of Chou’s pseudo-amino acid compositions: approaches with minimal redundancy maximal relevance feature selection (2016)
  5. Fatemi, Mohammad H.; Heidari, Afsane; Gharaghani, Sajjad: QSAR prediction of HIV-1 protease inhibitory activities using docking derived molecular descriptors (2015)
  6. Mei, Suyu: \textitSVMensemble based transfer learning for large-scale membrane proteins discrimination (2014)
  7. Mei, Suyu: Multi-kernel transfer learning based on Chou’s PseAAC formulation for protein submitochondria localization (2012)
  8. Chou, Kuo-Chen: Some remarks on protein attribute prediction and pseudo amino acid composition (2011)
  9. Khan, Asifullah; Majid, Abdul; Hayat, Maqsood: CE-PLoc: An ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition (2011)
  10. Mei, S.; Wang, F.; Zhou, S.: Gene ontology based transfer learning for protein subcellular localization (2011) ioport
  11. Esmaeili, Maryam; Mohabatkar, Hassan; Mohsenzadeh, Sasan: Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses (2010)
  12. Guang, Xuanmin; Guo, Yanzhi; Xiao, Jiamin; Wang, Xia; Sun, Jing; Xiong, Wenjia; Li, Menglong: Predicting the state of cysteines based on sequence information (2010)
  13. Yang, Qiang: Three challenges in data mining (2010) ioport
  14. Anand, Ashish; Suganthan, P. N.: Multiclass cancer classification by support vector machines with class-wise optimized genes and probability estimates (2009)
  15. Blum, Torsten; Briesemeister, Sebastian; Kohlbacher, Oliver: Multiloc2: integrating phylogeny and gene ontology terms improves subcellular protein localization prediction (2009) ioport
  16. Du, Pufeng; Cao, Shengjiao; Li, Yanda: SubChlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic (K)-nearest neighbor (ET-KNN) algorithm (2009)
  17. Xu, Qian; Hu, Derek Hao; Xue, Hong; Yu, Weichuan; Yang, Qiang: Semi-supervised protein subcellular localization (2009) ioport
  18. Zeng, Yu-hong; Guo, Yan-zhi; Xiao, Rong-quan; Yang, Li; Yu, Le-zheng; Li, Meng-long: Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach (2009)
  19. Lin, Hao: The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition (2008)
  20. Zhang, Tong-Liang; Ding, Yong-Sheng; Chou, Kuo-Chen: Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern (2008)

1 2 next