Understanding sequencing data as compositions: an outlook and review

Title
Understanding sequencing data as compositions: an outlook and review
Publication Date
2018-08
Author(s)
Quinn, Thomas P
Erb, Ionas
Richardson, Mark F
Crowley, Tamsyn M
Type of document
Journal Article
Language
en
Entity Type
Publication
Publisher
ASFRA B V
Place of publication
The Netherlands
DOI
10.1093/bioinformatics/bty175
UNE publication id
une:1959.11/60532
Abstract

Motivation: Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the assay technology itself, whereby the number of counts recorded for each sample is constrained by an arbitrary total sum (i.e. library size). Consequently, sequencing data, as compositional data, exist in a non-Euclidean space that, without normalization or transformation, renders invalid many conventional analyses, including distance measures, correlation coefficients and multivariate statistical models.

Results: The purpose of this review is to summarize the principles of compositional data analysis (CoDA), provide evidence for why sequencing data are compositional, discuss compositionally valid methods available for analyzing sequencing data, and highlight future directions with regard to this field of study.

Link
Citation
Bioinformatics, 34(16), p. 2870-2878
ISSN
0927-4588
Start page
2870
End page
2878
Rights
Attribution-NonCommercial 4.0 International

Files:

NameSizeformatDescriptionLink
openpublished/UnderstandingCrowley2018JournalArticle.pdf 227.004 KB application/pdf Published Version View document