Scientific Area
Species Delimitation 4.0: A new approach for machine-learning-based integrative taxon-omics in plants
ID: 613 / 274
Category: Abstract
Track: Pending
Proposed Symposium Title: Species Delimitation 4.0: A new approach for machine-learning-based integrative taxon-omics in plants
Authors:
Karbstein, Kevin1,2*, Kösters, Lara1, Hodač, Ladislav1, Hofmann, Martin2, Mäder, Patrick2,3,4, Wäldchen, Jana1,3
Affiliations: 1 Max Planck Institute for Biogeochemistry, Jena, Germany 2 Technical University of Ilmenau, Institute for Computer and Systems Engineering, Ilmenau, Germany 3 German Centre for Integrative Biodiversity Research (iDiv), Leipzig, Germany 4 Friedrich Schiller University, Institute of Ecology and Evolution, Jena, Germany * corresponding author: kkarb@bgc-jena.mpg.de
Abstract:
Species are the central units for taxonomic research and measuring Earth’s biodiversity. Recent findings in evolutionary genomics are raising awareness that what we call species can be ill-founded entities due to solely morphology-based or regional species descriptions. This particularly applies to plant groups characterized by intricate evolutionary processes such as hybridization, polyploidy, and/or apomixis. Here, challenges of integrative taxonomy, that is genetics/genomics combined with morphological, ecological, and other datasets, become apparent: (i) different favored species concepts (e.g., genetic vs. morphological concepts), (ii) missing appropriate analytical tools for intricate evolutionary processes (which often hurt assumptions of popular MSC-based models), and (iii) highly subjective ranking and fusion of datasets for final taxonomic treatments (e.g., whether genetics or morphology is taxonomically most important). Now, integrative taxon-omics combined with machine learning (ML) under a unified species concept may enable systematic data integration and thus reduce subjectivity in species delimitation (“species delimitation 4.0”). Recent ML applications predominantly rely on deep learning, which represents automated feature extraction and learning based on artificial neural networks (ANNs). We therefore tested different supervised (with labeled training datasets) and unsupervised (only using the inherent structure of testing datasets) ANN-based delimitation approaches to first evaluate genomic data, and then stepwise fuse the most likely species scenarios with other sources (e.g., ‘genomics’, ‘genomics+morphology’, ‘genomics+morphology+ecology’, etc.) using coarse- to fine-grained integrative taxon-omics datasets. Accuracy scores, confusion matrices, and bootstrapping techniques were applied to evaluate results. In addition, strategies to visualize highly important features within ANN classification/clustering processes (“XAI”) and current ANN limitations will be discussed. Integrative taxon-omics based on ML may help to delimit species less subjectively as well as more reliably and rapidly than traditional methods do, and thus may help to revise and unravel plant diversity also on a global scale.