Please use this identifier to cite or link to this item: https://hdl.handle.net/1959.11/51812
Title: A field guide for the compositional analysis of any-omics data
Contributor(s): Quinn, Thomas P (author); Erb, Ionas (author); Gloor, Greg (author); Notredame, Cedric (author); Richardson, Mark F (author); Crowley, Tamsyn M  (author)
Publication Date: 2019-09
Early Online Version: 2019-09-23
Open Access: Yes
DOI: 10.1093/gigascience/giz107
Handle Link: https://hdl.handle.net/1959.11/51812
Abstract: 

Background: Next-generation sequencing (NGS) has made it possible to determine the sequence and relative abundance of all nucleotides in a biological or environmental sample. A cornerstone of NGS is the quantification of RNA or DNA presence as counts. However, these counts are not counts per se: their magnitude is determined arbitrarily by the sequencing depth, not by the input material. Consequently, counts must undergo normalization prior to use. Conventional normalization methods require a set of assumptions: they assume that the majority of features are unchanged and that all environments under study have the same carrying capacity for nucleotide synthesis. These assumptions are often untestable and may not hold when heterogeneous samples are compared. Results: Methods developed within the field of compositional data analysis offer a general solution that is assumption-free and valid for all data. Herein, we synthesize the extant literature to provide a concise guide on how to apply compositional data analysis to NGS count data. Conclusions: In highlighting the limitations of total library size, effective library size, and spike-in normalizations, we propose the log-ratio transformation as a general solution to answer the question, "Relative to some important activity of the cell, what is changing?"

Publication Type: Journal Article
Source of Publication: GigaScience, 8(9), p. 1-14
Publisher: BioMed Central Ltd
Place of Publication: United Kingdom
ISSN: 2047-217X
Fields of Research (FoR) 2020: 310205 Proteomics and metabolomics
Socio-Economic Objective (SEO) 2020: 280102 Expanding knowledge in the biological sciences
Peer Reviewed: Yes
HERDC Category Description: C1 Refereed Article in a Scholarly Journal
Appears in Collections:Journal Article

Files in This Item:
2 files
File Description SizeFormat 
openpublished/AFieldCrowley2019JournalArticle.pdfPublished version3.61 MBAdobe PDF
Download Adobe
View/Open
Show full item record

SCOPUSTM   
Citations

156
checked on Nov 30, 2024

Page view(s)

1,034
checked on May 19, 2024

Download(s)

22
checked on May 19, 2024
Google Media

Google ScholarTM

Check

Altmetric


This item is licensed under a Creative Commons License Creative Commons