Please use this identifier to cite or link to this item: https://hdl.handle.net/1959.11/20616
Title: Specimens at the Center: An Informatics Workflow and Toolkit for Specimen-level Analysis of Public DNA Database Data
Contributor(s): Pham, Kasey K (author); Ford, Bruce A (author); Gebauer, Sebastian (author); Gehrke, Berit (author); Hoffmann, Matthias H (author); Hoshino, Takuji (author); Jimenez-Mejias, Pedro (author); Jung, Jongduk (author); Kim, Sangtae (author); Luceno, Modesto (author); Maguilla, Enrique (author); Hahn, Marlene (author); Martin-Bravo, Santiago (author); Naczi, Robert F C (author); Reznicek, Anton A (author); Roalson, Eric H (author); Simpson, David A (author); Starr, Julian R (author); Villaverde, Tamara (author); Waterway, Marcia J (author); Wilson, Karen L (author); Yano, Okihito (author); Lueders, Kate (author); Zhang, Shuren (author); Hipp, Andrew L (author); Brown, Bethany H (author); Bruederle, Leo P (author); Bruhl, Jeremy J  (author)orcid ; Chung, Kyong-Sook (author); Derieg, Nathan J (author); Escudero, Marcial (author)
Corporate Author: Global Carex Group (GCG)
Publication Date: 2016
DOI: 10.1600/036364416x692505
Handle Link: https://hdl.handle.net/1959.11/20616
Abstract: Major public DNA databases - NCBI GenBank, the DNA DataBank of Japan (DDBJ), and the European Molecular Biology Laboratory (EMBL) - are invaluable biodiversity libraries. Systematists and other biodiversity scientists commonly mine these databases for sequence data to use in phylogenetic studies, but such studies generally use only the taxonomic identity of the sequenced tissue, not the specimen identity. Thus studies that use DNA supermatrices to construct phylogenetic trees with species at the tips typically do not take advantage of the fact that for many individuals in the public DNA databases, several DNA regions have been sampled; and for many species, two or more individuals have been sampled. Thus these studies typically do not make full use of the multigene datasets in public DNA databases to test species coherence and select optimal sequences to represent a species. In this study, we introduce a set of tools developed in the R programming language to construct individual-based trees from NCBI GenBank data and present a set of trees for the genus 'Carex' (Cyperaceae) constructed using these methods. For the more than 770 species for which we found sequence data, our approach recovered an average of 1.85 gene regions per specimen, up to seven for some specimens, and more than 450 species represented by two or more specimens. Depending on the subset of genes analyzed, we found up to 42% of species monophyletic. We introduce a simple tree statistic-the Taxonomic Disparity Index (TDI)-to assist in curating specimen-level datasets and provide code for selecting maximally informative (or, conversely, minimally misleading) sequences as species exemplars. While tailored to the 'Carex' dataset, the approach and code presented in this paper can readily be generalized to constructing individual-level trees from large amounts of data for any species group.
Publication Type: Journal Article
Source of Publication: Systematic Botany, 41(3), p. 529-539
Publisher: American Society of Plant Taxonomists
Place of Publication: United States of America
ISSN: 1548-2324
0363-6445
Fields of Research (FoR) 2008: 060310 Plant Systematics and Taxonomy
Fields of Research (FoR) 2020: 310411 Plant and fungus systematics and taxonomy
310410 Phylogeny and comparative analysis
310402 Biogeography and phylogeography
Socio-Economic Objective (SEO) 2008: 960805 Flora, Fauna and Biodiversity at Regional or Larger Scales
Socio-Economic Objective (SEO) 2020: 280102 Expanding knowledge in the biological sciences
180606 Terrestrial biodiversity
Peer Reviewed: Yes
HERDC Category Description: C1 Refereed Article in a Scholarly Journal
Appears in Collections:Journal Article

Files in This Item:
2 files
File Description SizeFormat 
Show full item record

SCOPUSTM   
Citations

7
checked on Apr 6, 2024

Page view(s)

1,086
checked on Mar 8, 2023

Download(s)

4
checked on Mar 8, 2023
Google Media

Google ScholarTM

Check

Altmetric


Items in Research UNE are protected by copyright, with all rights reserved, unless otherwise indicated.