Scientific Area
Abstract Detail
Nº613/348 - Cross-platform and containerised phylogenomics-bioinformatics pipelines for sequence capture data: hybpiper-nf and paragone-nf
Format: ORAL
Authors
Alexander N. Schmidt-Lebuhn1, Chris Jackson2, Todd McLay3
Affiliations
1 Centre for Australian National Biodiversity Research (a joint venture of Parks Australia and CSIRO), Canberra, Australia
2 Royal Botanic Gardens Victoria, Melbourne, Australia
3 CSIRO, Melbourne, Australia
Abstract
The phylogenomics activity of the Genomics for Australian Plants consortium uses target capture / enrichment data. Early in the process, a phylogenomics-bioinformatics working group was established to evaluate options for the consortiums technical approach ranging from the desired sequencing depth across assembly to the phylogenetic analysis itself. In terms of the analysis pipeline, paralogy was identified as a key concern because phylogenetic analyses were going to be conducted both at the scale of all angiosperms and in selected genera and plant families known to be characterised by ancestral genome duplication events. We adapted the well-established HybPiper software for sequence assembly and paralog discovery, and a collection of previously published scripts for the inference of ortholog groups from gene tree topologies. During the course of our work, we iteratively built an integrated workflow in which the HybPiper outputs were provided in a format directly usable for ortholog inference, discovered and fixed bugs, and added alternative options for several steps such as assembly, alignments, and gene tree inference. The resulting pipelines have been published as two Nextflow scripts with a Singularity container providing all dependencies to allow easy installation independent of platform, hybpiper-nf and paragone-nf, and recently also as Python packages that can be installed with conda. We will discuss the history of their development, the principles behind the analyses, and important options available to the user.