Scientific Area
Abstract Detail
Nº613/1712 - Applying the phylogenomics package Captus to polyploid, contaminated, and degraded samples
Format: ORAL
Authors
Edgardo M. Ortiz1, Gentaro Shigita1,2, Alina Hwener3, Mustafa Raza1, Hanno Schaefer1
Affiliations
1 Technical University of Munich, Freising, Germany
2 University of Tsukuba, Tsukuba, Japan
3 Ludwig Maximilian University of Munich, Munich, Germany
Abstract
Our recently developed software package, Captus (available at https://github.com/edgardomortiz/Captus), greatly simplifies processing high-throughput sequencing data into alignments ready for phylogenetic analyses. Captus can remove paralogs from these alignments for use in traditional species tree estimation methods or include the complete set of recovered paralogs for use in more recent methods such as ASTRAL-Pro.
Even though Captus normally searches for previously known sets of orthologs (e.g., Angiosperms353, BUSCO lineage databases) it can also be used to discover new putative homologs across the samples analyzed, making it possible to extract phylogenetically informative loci from any group of organisms.
Besides recovering accurately and rapidly a greater number of more complete loci across more samples, Captus is also capable of dealing with several common issues that arise in current phylogenomics practice. We show that the accuracy of recovery helps minimizing the methodological sources of conflict between gene trees and the species tree revealing clear patterns of reticulation or hybridization.
Captus facilitates the combination of sequence data from samples of varying quality (i.e., silica-preserved, herbarium), different data types (i.e., target capture, high and low depth whole genome sequencing, and RNA-Seq), and pre-assembled genomes and transcriptomes to build a single phylogenomic dataset for analysis. Captus can also be used to identify outlier samples containing an excess of paralogs due to polyploidy or potentially contamination. Captus can be used to remove phylogenetically distant contaminants from a sample.
Finally, we show an additional feature of Captus, an easy-to-use workflow to find putative homologs and design baits for target enrichment and how we used it to create a set of baits to capture 989 protein coding genes (854 single-copy and 135 associated with morphological traits of interest) as well as 177 non-coding regions for the family Cucurbitaceae.