Example: air traffic controller

A survey of best practices for RNA-seq data analysis

REVIEWOpen AccessA survey of best practices for RNA-seq dataanalysisAna Conesa1,2*, Pedro Madrigal3,4*, Sonia Tarazona2,5, David Gomez-Cabrero6,7,8,9, Alejandra Cervera10,Andrew McPherson11, Micha Wojciech Szcze niak12, Daniel J. Gaffney3, Laura L. Elo13, Xuegong Zhang14,15and Ali Mortazavi16,17*AbstractRNA-sequencing ( RNA-seq ) has a wide variety ofapplications, but no single analysis pipeline can beused in all cases. We review all of the major steps inRNA-seq data analysis , including experimental design,quality control, read alignment, quantification of geneand transcript levels, visualization, differential geneexpression, alternative splicing, functional analysis ,gene fusion detection and eQTL mapping.

Jan 12, 2016 · Another important factor is sequencing depth or li-brary size, which is the number of sequenced reads for a given sample. More transcripts will be detected and their quantification will be more precise as the sample is se-quenced to a deeper level [1]. Nevertheless, optimal se-quencing depth again depends on the aims of the experiment.

Tags:

  Barry, Li brary

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of A survey of best practices for RNA-seq data analysis

1 REVIEWOpen AccessA survey of best practices for RNA-seq dataanalysisAna Conesa1,2*, Pedro Madrigal3,4*, Sonia Tarazona2,5, David Gomez-Cabrero6,7,8,9, Alejandra Cervera10,Andrew McPherson11, Micha Wojciech Szcze niak12, Daniel J. Gaffney3, Laura L. Elo13, Xuegong Zhang14,15and Ali Mortazavi16,17*AbstractRNA-sequencing ( RNA-seq ) has a wide variety ofapplications, but no single analysis pipeline can beused in all cases. We review all of the major steps inRNA-seq data analysis , including experimental design,quality control, read alignment, quantification of geneand transcript levels, visualization, differential geneexpression, alternative splicing, functional analysis ,gene fusion detection and eQTL mapping.

2 Wehighlight the challenges associated with each discuss the analysis of small RNAs and theintegration of RNA-seq with other functionalgenomics techniques. Finally, we discuss the outlookfor novel technologies that are changing the state ofthe art in identification and the quantification of geneexpression have been distinct core activities in molecularbiology ever since the discovery of RNA s role as the keyintermediate between the genome and the power of sequencing RNA lies in the fact that thetwin aspects of discovery and quantification can be com-bined in a single high-throughput sequencing assaycalled RNA-sequencing ( RNA-seq ). The pervasive adop-tion of RNA-seq has spread well beyond the genomicscommunity and has become a standard part of the toolkitused by the life sciences research community.

3 Many varia-tions of RNA-seq protocols and analyses have been* for Food and Agricultural Sciences, Department of Microbiologyand Cell Science, University of Florida, Gainesville, FL 32603, USA3 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton,Cambridge CB10 1SA, UK16 Department of Developmental and Cell Biology, University of California,Irvine, Irvine, CA 92697-2300, USAFull list of author information is available at the end of the articlepublished, making it challenging for new users to appreci-ate all of the steps necessary to conduct an RNA-seq is no optimal pipeline for the variety of differentapplications and analysis scenarios in which RNA-seqcan be used.

4 Scientists plan experiments and adopt dif-ferent analysis strategies depending on the organism be-ing studied and their research goals. For example, if agenome sequence is available for the studied organism,it should be possible to identify transcripts by mappingRNA-seq reads onto the genome. By contrast, for organ-isms without sequenced genomes, quantification wouldbe achieved by first assembling reads de novo into con-tigs and then mapping these contigs onto the transcrip-tome. For well-annotated genomes such as the humangenome, researchers may choose to base their RNA-seqanalysis on the existing annotated reference transcrip-tome alone, or might try to identify new transcripts andtheir differential regulation.

5 Furthermore, investigatorsmight be interested only in messenger RNA isoform ex-pression or microRNA (miRNA) levels or allele variantidentification. Both the experimental design and the ana-lysis procedures will vary greatly in each of these can be used solo for transcriptome profiling orin combination with other functional genomics methodsto enhance the analysis of gene expression. Finally, RNA-seq can be coupled with different types of biochemicalassay to analyze many other aspects of RNA biology, suchas RNA protein binding, RNA structure, or RNA RNAinteractions. These applications are, however, beyond thescope of this review as we focus on typical RNA-seq experimental scenario could poten-tially have different optimal methods for transcriptquantification, normalization, and ultimately differentialexpression analysis .

6 Moreover, quality control checksshould be applied pertinently at different stages of theanalysis to ensure both reproducibility and reliability ofthe results. Our focus is to outline current standards 2016 Conesa et AccessThis article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, andreproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link tothe Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( ) applies to the data made available in this article, unless otherwise al.

7 Genome Biology (2016) 17:13 DOI resources for the bioinformatics analysis of RNA-seq data. We do not aim to provide an exhaustive com-pilation of resources or software tools nor to indicateone best analysis pipeline. Rather, we aim to provide acommented guideline for RNA-seq data analysis . Figure 1depicts a generic roadmap for experimental design andanalysis using standard Illumina sequencing. We alsobriefly list several data integration paradigms that havebeen proposed and comment on their potential and limi-tations. We finally discuss the opportunities as well aschallenges provided by single-cell RNA-seq and long-read technologies when compared to traditional short-read designA crucial prerequisite for a successful RNA-seq study isthat the data generated have the potential to answer thebiological questions of interest.

8 This is achieved by firstdefining a good experimental design, that is, by choosingthe library type, sequencing depth and number of repli-cates appropriate for the biological system under study,and second by planning an adequate execution of the se-quencing experiment itself, ensuring that data acquisi-tion does not become contaminated with unnecessarybiases. In this section, we discuss both important aspect of the experimental design isthe RNA-extraction protocol used to remove the highlyabundant ribosomal RNA (rRNA), which typically con-stitutes over 90 % of total RNA in the cell, leaving the1 2 % comprising messenger RNA (mRNA) that we arenormally interested in. For eukaryotes, this involveschoosing whether to enrich for mRNA using poly(A) se-lection or to deplete rRNA.

9 Poly(A) selection typicallyrequires a relatively high proportion of mRNA with min-imal degradation as measured by RNA integrity number(RIN), which normally yields a higher overall fraction ofreads falling onto known exons. Many biologically rele-vant samples (such as tissue biopsies) cannot, however,be obtained in great enough quantity or good enoughmRNA integrity to produce good poly(A) RNA-seq li-braries and therefore require ribosomal depletion. Forbacterial samples, in which mRNA is not polyadenylated,Fig. 1A generic roadmap for RNA-seq computational analyses. The major analysis steps are listed above the lines for pre- analysis , core analysisand advanced analysis . The key analysis issues for each step that are listed below the lines are discussed in the includesexperimental design, sequencing design, and quality control analyses include transcriptome profiling, differential gene expression,and functional analysis includes visualization, other RNA-seq technologies, and data integration.

10 Abbreviations:ChIP-seqChromatin immunoprecipitation sequencing,eQTLE xpression quantitative loci,FPKMF ragments per kilobase of exon model per million mappedreads,GSEAGene set enrichment analysis ,PCAP rincipal component analysis ,RPKMR eads per kilobase of exon model per million reads,sQTLS plicing quantitative trait loci,TFTranscription factor,TPMT ranscripts per millionConesaet al. Genome Biology (2016) 17:13 Page 2 of 19the only viable alternative is ribosomal depletion. Anotherconsideration is whether to generate strand-preserving li-braries. The first generation of Illumina-based RNA-seqused random hexamer priming to reverse-transcribepoly(A)-selected mRNA. This methodology did not retaininformation contained on the DNA strand that is actuallyexpressed [1] and therefore complicates the analysis andquantification of antisense or overlapping transcripts.


Related search queries