Scientific Area
Abstract Detail
Nº613/1080 - Assessing the power of artificial intelligence approaches for birth-death model classification
Format: ORAL
Authors
Pablo Gutirrez de la Pea1*, Guillermo Iglesias Hernndez2; Isabel Sanmartín1 & Andrea Sánchez Meseguer1
Affiliations
1 Real Jardín Botánico, CSIC, Madrid, Spain
2 Universidad Politécnica de Madrid, Madrid, Spain
Abstract
Birth-death (BD) models applied to dated phylogenies, are a useful tool to study past diversification dynamics in the absence of complete fossil record. Parameters in these stochastic models are typically inferred using likelihood-based methods such as Maximum Likelihood or Bayesian Inference. However, these methods require the formulation of a new likelihood algorithm each time a new model is proposed, and some of the most complex models are also computationally intractable, or mathematically non-identifiable, with two models generating identical probability distributions. The last years have witnessed a revolution, with artificial inteligence(AI) methods applied to phylogenetic inference, species delimitation, or phylodynamics. However, the power of these approaches in birth-death modeling is almost unexplored. Here, we tackle a classification problem, the power of AI algorithms to discriminate among six different diversification scenarios: constant birthdeath (BD), high extinction (HE), mass extinction (ME), stasis and radiate (SR), diversity dependent (DD), and waxing and waning (WW). We simulated10,000trees under each diversification scenarioand encoded the phylogenies using two different representation techniques: a set of summary statistics and the compact bijective ladderized vector encoding. These simulations were used to train and validate two different AI methods: convolutional neural networks (CNN) and random forests (RF), and the trained AI algorithm was used to predict the most probable model of diversification for selected empirical phylogenies. Finally, we compared the performance of AI methods with previous likelihood-based approaches. The cross-validation approach showed that the percentage of correct assignment for the simulated scenarios was highfor the two DL algorithms, except for BD and SR models. The most accurate strategy was the combination of RF algorithms with summary statistics as the tree representation. We hypothesize the hierarchical structure of phylogenetic trees it is more difficult to capture using CNN designed for image-based encoding.