Please use this identifier to cite or link to this item: https://hdl.handle.net/1959.11/51906
Title: Benchmarking differential expression analysis tools for RNA-Seq: normalization-based vs. log-ratio transformation-based methods
Contributor(s): Quinn, Thomas P (author); Crowley, Tamsyn M  (author); Richardson, Mark F (author)
Publication Date: 2018-07-18
Open Access: Yes
DOI: 10.1186/s12859-018-2261-8
Handle Link: https://hdl.handle.net/1959.11/51906
Abstract: 

Background: Count data generated by next-generation sequencing assays do not measure absolute transcript abundances. Instead, the data are constrained to an arbitrary "library size" by the sequencing depth of the assay, and typically must be normalized prior to statistical analysis. The constrained nature of these data means one could alternatively use a log-ratio transformation in lieu of normalization, as often done when testing for differential abundance (DA) of operational taxonomic units (OTUs) in 16S rRNA data. Therefore, we benchmark how well the ALDEx2 package, a transformation-based DA tool, detects differential expression in high-throughput RNA-sequencing data (RNA-Seq), compared to conventional RNA-Seq methods such as edgeR and DESeq2.
Results: To evaluate the performance of log-ratio transformation-based tools, we apply the ALDEx2 package to two simulated, and two real, RNA-Seq data sets. One of the latter was previously used to benchmark dozens of conventional RNA-Seq differential expression methods, enabling us to directly compare transformation-based approaches. We show that ALDEx2, widely used in meta-genomics research, identifies differentially expressed genes (and transcripts) from RNA-Seq data with high precision and, given sufficient sample sizes, high recall too (regardless of the alignment and quantification procedure used). Although we show that the choice in log-ratio transformation can affect performance, ALDEx2 has high precision (i.e., few false positives) across all transformations. Finally, we present a novel, iterative log-ratio transformation (now implemented in ALDEx2) that further improves performance in simulations.
Conclusions: Our results suggest that log-ratio transformation-based methods can work to measure differential expression from RNA-Seq data, provided that certain assumptions are met. Moreover, these methods have very high precision (i.e., few false positives) in simulations and perform well on real data too. With previously demonstrated applicability to 16S rRNA data, ALDEx2 can thus serve as a single tool for data from multiple sequencing modalities.

Publication Type: Journal Article
Source of Publication: BMC Bioinformatics, v.19, p. 1-15
Publisher: BioMed Central Ltd
Place of Publication: United Kingdom
ISSN: 1471-2105
Fields of Research (FoR) 2020: 310208 Translational and applied bioinformatics
Socio-Economic Objective (SEO) 2020: 280118 Expanding knowledge in the mathematical sciences
Peer Reviewed: Yes
HERDC Category Description: C1 Refereed Article in a Scholarly Journal
Appears in Collections:Journal Article
PoultryHub Australia

Files in This Item:
2 files
File Description SizeFormat 
openpublished/BenchmarkingCrowley2018JournalArticle.pdfPublished version3.22 MBAdobe PDF
Download Adobe
View/Open
Show full item record

SCOPUSTM   
Citations

34
checked on Apr 6, 2024

Page view(s)

1,576
checked on Mar 31, 2024

Download(s)

14
checked on Mar 31, 2024
Google Media

Google ScholarTM

Check

Altmetric


This item is licensed under a Creative Commons License Creative Commons