Abstract Detail

Nº613/1600 - Strategies to improve alignment accuracy in large-scale plant phylogenomic analyses
Format: ORAL
Authors
Michael Gruenstaeudl
Affiliations
Department of Biological Sciences, Fort Hays State University, Kansas, USA
Abstract
Despite the large quantities of nucleotide sequences that are analyzed in contemporary plant phylogenomics, the process of multiple sequence alignment (MSA) remains one of its central tenets. Any phylogenomic investigation will only be as reliable as the positional homology established among its underlying sequences, irrespective of their length, genomic origin, or taxonomic representation. Diversifying or further increasing the amount of input sequences is not a substitute for careful and often time-expensive MSA. Instead, the ever-larger data volumes in plant phylogenomics highlight the demand for analysis workflows that accommodate the complexities of nucleotide sequence evolution, especially at the microstructural level. In this presentation, I argue that despite the skyrocketing amounts of sequence data in contemporary plant phylogenomics, the associated MSA process must not be dismissed as a cumbersome side aspect. Specifically, I highlight that while the existing toolkit of MSA software appears sufficiently diverse to tackle the challenges of large-scale plant phylogenomics, their integration into phylogenomic analysis pipelines awaits substantial improvement, especially when accounting for genome-specific idiosyncracies and microstructural mutations. Several empirical examples are presented to support these observations. First, I demonstrate that the correct assessment of positional homology in genome-level sequence data can significantly improve plastid phylogenomic reconstructions in a lineage of South American sunflowers. Second, I illustrate that the manual post-processing of software-generated MSAs significantly improves the reliability of phylogenomic reconstructions among water lilies and that such post-processing can be achieved with reasonable effort even for phylogenomic data. Third, I introduce a software tool that enables the automatic separation of genome sequence data into natural partitions - genes, introns, and intergenic spacers - across thousands of input genomes and automates their individual MSA for different software tools. Based on these results, I highlight the need for an increased effort to develop targeted and customized MSA strategies for plant phylogenomics.