Scientific Area
Abstract Detail
Nº613/2177 - Matching herbarium specimens with Mary Vaux Walcott watercolor paintings using machine learning and detective work
Format: ORAL
Authors
Mike Trizna1, Will Weaver2
Affiliations
1 Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, DC, USA
2 Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
Abstract
Mary Vaux Walcotts five-volume collection of illustrations and descriptions, North American wild flowers, was published between the years 1925 and 1929. All of the volumes, which include over 400 watercolor prints, have been fully digitized and made available through the Biodiversity Heritage Library (https://doi.org/10.5962/bhl.title.67774). Many of the illustrations were based on plant specimens collected by Walcott herself or other botanists from the Smithsonian Institution, and specimen labels on their preserved herbaria sheets include written notations that reference this connection. One example is US:1087120 (http://n2t.net/ark:/65665/3fd0d532a-72a3-4818-9dbe-904d72272759), which has a notation that it was Painted by Mrs. Walcott, and closely matches Plate 44 of Volume 1 (https://www.biodiversitylibrary.org/page/42602857). In this project, we used digitized specimen data to explore candidate matching specimens, machine learning OCR and LLM validation tools from the VoucherVision project (https://github.com/Gene-Weaver/VoucherVision) to transcribe and clean specimen labels, and computer vision tools from LeafMachine2 (https://github.com/Gene-Weaver/LeafMachine2) to isolate morphological and phenotypic traits for comparison. With confident pairs of illustrations and specimens, we investigate the possibility of using the Biodiversity Heritage Librarys extensive collection of botanical illustrations to extract valuable phenotypic and morphological data.