Noisy: identification of problematic columns in multiple sequence alignments. MOTIVATION: Sequence-based methods for phylogenetic reconstruction from (nucleic acid) sequence data are notoriously plagued by two effects: homoplasies and alignment errors. Large evolutionary distances imply a large number of homoplastic sites. As most protein-coding genes show dramatic variations in substitution rates that are not uncorrelated across the sequence, this often leads to a patchwork pattern of (i) phylogenetically informative and (ii) effectively randomized regions. In highly variable regions, furthermore, alignment errors accumulate resulting in sometimes misleading signals in phylogenetic reconstruction. RESULTS: We present here a method that, based on assessing the distribution of character states along a cyclic ordering of the taxa, allows the identification of phylogenetically uninformative homoplastic sites in a multiple sequence alignment. Removal of these sites appears to improve the performance of phylogenetic reconstruction algorithms as measured by various indices of ”tree quality”. In particular, we obtain more stable trees due to the exclusion of phylogenetically incompatible sites that most likely represent strongly randomized characters. SOFTWARE: The computer program noisy implements this approach. It can be employed to improving phylogenetic reconstruction capability with quite a considerable success rate whenever (1) the average bootstrap support obtained from the original alignment is low, and (2) there are sufficiently many taxa in the data set - at least, say, 12 to 15 taxa. The software can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/noisy/.
Keywords for this software
References in zbMATH (referenced in 5 articles )
Showing results 1 to 5 of 5.
- Prohaska, Sonja J.; Berkemer, Sarah J.; Gärtner, Fabian; Gatter, Thomas; Retzlaff, Nancy; Höner zu Siederdissen, Christian; Stadler, Peter F.: Expansion of gene clusters, circular orders, and the shortest Hamiltonian path problem (2018)
- DeBlasio, Dan; Kececioglu, John: Parameter advising for multiple sequence alignment (2017)
- Ioanna Manolopoulou, Axel Hille: BPEC: An R Package for Bayesian Phylogeographic and Ecological Clustering (2016) arXiv
- Kim, Jaebum; Sinha, Saurabh: Towards realistic benchmarks for multiple alignments of non-coding sequences (2010) ioport
- Schreiber, Fabian; Pick, Kerstin; Erpenbeck, Dirk; Wörheide, Gert; Morgenstern, Burkhard: Orthoselect: a protocol for selecting orthologous groups in phylogenomics (2009) ioport