Rapid and sensitive sequence comparison with FASTP and FASTA. The FASTA program can search the NBRF protein sequence library (2.5 million residues) in less than 20 min on an IBM-PC microcomputer and unambiguously detect proteins that shared a common ancestor billions of years in the past. FASTA is both fast and selective because it initially considers only amino acid identities. Its sensitivity is increased not only by using the PAM250 matrix to score and rescore regions with large numbers of identities but also by joining initial regions. The results of searches with FASTA compare favorably with results using NWS-based programs that are 100 times slower. FASTA is slightly less sensitive but considerably more selective. It is not clear that NWS-based programs would be more successful in finding distantly related members of the G-protein-coupled receptor family. The joining step by FASTA to calculate the initn score is especially useful for sequences that share regions of sequence similarity that are separated by variable-length loops. FASTP and FASTA were designed to identify protein sequences that have descended from a common ancestor, and they have proved very useful for this task. In many cases, a FASTA sequence search will result in a list of high scoring library sequences that are homologous to the query sequence, or the search will result in a list of sequences with similarity scores, that cannot be distinguished from the bulk of the library. In either case, the question of whether there are sequences in the library that are clearly related to the query sequence has been answered unambiguously. Unfortunately, the results often will not be so clear-cut, and careful analysis and the biological context are required. In the course of analyzing the G-protein-coupled receptor family, several proteins were found that, because with optimization, appeared to be members of this family which were not previously recognized. RDF2 analysis showed borderline z values, and only a careful examination of the sequence alignments that focused on the conserved residues provided convincing evidence that the high scores were fortuitous. As sequence comparison methods become more powerful by becoming more sensitive, they become more likely to mislead, and even greater care is required.

References in zbMATH (referenced in 25 articles )

Showing results 1 to 20 of 25.
Sorted by year (citations)

1 2 next

  1. Gullotto, Danilo; Nolassi, Mario Salvatore; Bernini, Andrea; Spiga, Ottavia; Niccolai, Neri: Probing the protein space for extending the detection of weak homology folds (2013)
  2. Zhu, Wei; Hou, Jingyu; Chen, Yi-Ping Phoebe: Exploiting multi-layered information to iteratively predict protein functions (2012)
  3. Yu, Chenglong; Deng, Mo; Yau, Stephen S.-T.: DNA sequence comparison by a novel probabilistic method (2011) ioport
  4. Homer, Nils; Nelson, Stanley F.; Merriman, Barry: Local alignment of generalized k-base encoded DNA sequence (2010) ioport
  5. Sung, Wing-Kin: Algorithms in bioinformatics. A practical introduction. (2010)
  6. Rubino, Francesco; Attimonelli, Marcella: Regexpblasting (REB), a regular expression blasting algorithm based on multiply aligned sequences (2009) ioport
  7. Yue, Feng; Shi, Jian; Tang, Jijun: Simultaneous phylogeny reconstruction and multiple sequence alignment (2009) ioport
  8. Bernardes, Juliana S.; Davila, Alberto Mr; Costa, Vitor S.; Zaverucha, Gerson: Improving model construction of profile hmms for remote homology detection through structural alignment (2007) ioport
  9. Ma, Bin; Wang, Lusheng; Li, Ming: Near optimal multiple alignment within a band in polynomial time (2007)
  10. Shah, Anuj R.; Oehmen, Christopher S.; Harper, Jill; Webb-Robertson, Bobbie-Jo M.: Integrating subcellular location for improving machine learning models of remote homology detection in eukaryotic organisms (2007)
  11. Xia, Xuhua: Bioinformatics and the cell. Modern computational approaches in genomics, proteomics and transcriptomics (2007)
  12. Bourdon, Jérémie; Mancheron, Alban: Statistical properties of similarity score functions (2006)
  13. Vijaya, P. A.; Murty, M. Narasimha; Subramanian, D. K.: Efficient median based clustering and classification techniques for protein sequences (2006) ioport
  14. Liu, Xin; Xia, Huaxia; Chien, Andrew A.: Validating and scaling the MicroGrid: A scientific instrument for grid dynamics (2005)
  15. Mahoui, Malika; Lu, Lingma; Gao, Ning; Li, Nianhua; Chen, Jessica; Bukhres, Omran; Miled, Zina Ben: A dynamic workflow approach for the integration of bioinformatics services (2005) ioport
  16. Webb-Robertson, Bobbie-Jo; Oehmen, Christopher; Matzke, Melissa: SVM-BALSA: Remote homology detection based on Bayesian sequence alignment (2005)
  17. Sugie, Takashige; Ito, Tomoyoshi; Ebisuzaki, Toshikazu: A special-purpose computer for exploring similar biological sequences: bioler-2 with multi-pipeline and multi-sequence architecture (2004) ioport
  18. Mukhopadhyay, Snehasis; Tang, Changhong; Huang, Jeffery; Palakal, Mathew: Genetic sequence classification and its application to cross-species homology detection (2003)
  19. Kohonen, Teuvo; Somervuo, Panu: How to make large self-organizing maps for nonvectorial data. (2002) ioport
  20. Ong, Twee-Hee; Tan, Kian-Lee; Wang, Hao: Indexing genomic databases for fast homology searching (2002)

1 2 next