New approaches for machine-learning-based integrative taxon-omics in plants

ID: 613 / 274

Category: Abstract

Track: Pending

Proposed Symposium Title: New approaches for machine-learning-based integrative taxon-omics in plants

Authors:

Karbstein, Kevin1,2*, Kösters, Lara1, Hodač, Ladislav1, Hofmann, Martin2, Mäder, Patrick2,3,4, Wäldchen, Jana1,3

Affiliations: 1 Max Planck Institute for Biogeochemistry, Jena, Germany 2 Technical University of Ilmenau, Institute for Computer and Systems Engineering, Ilmenau, Germany 3 German Centre for Integrative Biodiversity Research (iDiv), Leipzig, Germany 4 Friedrich Schiller University, Institute of Ecology and Evolution, Jena, Germany * corresponding author: kkarb@bgc-jena.mpg.de

Abstract:

Species are the central units for taxonomic research and measuring Earth’s biodiversity. Recent findings in evolutionary genomics are raising awareness that what we call species can be ill-founded entities due to solely morphology-based or regional species descriptions. This particularly applies to plant groups characterized by intricate evolutionary processes such as hybridization, polyploidy, and/or apomixis. Here, challenges of integrative taxon-omics, that is genomics combined with morphological, ecological, and other datasets, become apparent: (i) different favored species concepts (e.g., genetic vs. morphological concepts), (ii) missing appropriate analytical tools for intricate evolutionary processes (which often hurt assumptions of popular MSC-based models), and (iii) highly subjective ranking and fusion of datasets for final taxonomic treatments (e.g., whether genetics or morphology is taxonomically most important). Now, integrative taxon-omics combined with machine learning (ML) under a unified species concept may enable systematic data integration and thus reduce subjectivity in species classification and delimitation. Recent ML applications predominantly rely on deep learning, which represents automated feature extraction and learning based on artificial neural networks (ANNs). We will introduce supervised (with labeled training datasets) and unsupervised (only using the inherent structure of testing datasets) ANN approaches that fuse genomic information with other sources (e.g., ‘genomics’, ‘genomics+morphology’, ‘genomics+morphology+ecology’, etc.) using integrative taxonomic datasets. Accuracy scores, confusion matrices, and bootstrapping techniques were applied to evaluate results. In addition, strategies to visualize highly important features within ANN classification or clustering processes (“XAI”) and current ANN limitations will be discussed. Integrative taxon-omics based on ML may help classify or delimit species less subjectively as well as more reliably and rapidly than traditional methods do, and thus may help to revise and unravel plant diversity also on a global scale.

: 182, 75, 173

Key words: integrative taxon-omics, machine learning, reticulate evolution, species delimitation, supervised vs. unsupervised learning, taxonomically complex groups (TCGs)