Example: marketing

Practical Guide to Interpreting RNA-seq Data

Practical Guide to Interpreting RNA-seq DataSkyler Kuhn1,2 Mayank Tandon1,21. CCR Collaborative Bioinformatics Resource (CCBR), Center for Cancer Research, NCI2. Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer ResearchOverviewI. Experimental DesignHypothesis-drivenOverview of Best PracticeII. Quality-controlPre- and post- alignment QC metricsInterpretationIII. PipelineFastQ Files -> Counts matrixReproducibility 1IV. Downstream AnalysisPrincipal Components Analysis (PCA)Differential ExpressionPathway AnalysisV. Advanced VisualizationsGroup comparisonsAlternative Splicing EventsPathway Diagrams Design: Overview Hypothesis-drivenAddresses a well thought-out quantifiable questionConsiderations: Library Construction: mRNA versus total RNAS ingle-end versus Paired-end SequencingSequencing Depth: quantifying gene-level or transcript-level expressionNumber of Replicates: statistical-power and abilit

Practical Guide to Interpreting RNA-seq Data Skyler Kuhn1,2 Mayank Tandon1,2 1. CCR Collaborative Bioinformatics Resource (CCBR), Center for Cancer Research, NCI 2. Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research

Tags:

  Data, Interpreting

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Practical Guide to Interpreting RNA-seq Data

1 Practical Guide to Interpreting RNA-seq DataSkyler Kuhn1,2 Mayank Tandon1,21. CCR Collaborative Bioinformatics Resource (CCBR), Center for Cancer Research, NCI2. Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer ResearchOverviewI. Experimental DesignHypothesis-drivenOverview of Best PracticeII. Quality-controlPre- and post- alignment QC metricsInterpretationIII. PipelineFastQ Files -> Counts matrixReproducibility 1IV. Downstream AnalysisPrincipal Components Analysis (PCA)Differential ExpressionPathway AnalysisV. Advanced VisualizationsGroup comparisonsAlternative Splicing EventsPathway Diagrams Design: Overview Hypothesis-drivenAddresses a well thought-out quantifiable questionConsiderations: Library Construction: mRNA versus total RNAS ingle-end versus Paired-end SequencingSequencing Depth: quantifying gene-level or transcript-level expressionNumber of Replicates: statistical-power and ability drop a bad sampleReducing Batch Effects Design: Library Construction Total RNA contains high-levels of ribosomal RNA (rRNA).

2 80%mRNApoly(A) selection ~ standard profiling for gene expressionLow RIN may results in 3 biasTotal RNArRNA depletionmRNA + non-coding RNA species (lncRNA)Prokaryotic samples Design: Sequencing Depth mRNA: poly(A)-selectionRecommended Sequencing Depth: 10-20M paired-end reads (or 20-40M reads)RNA must be high quality (RIN > 8)Total RNA: rRNA depletionRecommended Sequencing Depth: 25-60M paired-end reads (or 50-120M reads)RNA must be high quality (RIN > 8) * Differential Isoform regulation or alternative splicing events: > 100M paired-end Design: Number of Replicates RecommendedBiological Replicates > Technical ReplicatesNumber of Replicates: 4 Peace-of-mind: Ability drop a bad sample without compromising statistical powerBare MinimumBiological Replicates > Technical ReplicatesNumber of Replicates: 3 Design.

3 Reducing Batch Effects Unwanted sources of technical variationDecrease batch effects by uniform processingProtocol-drivenDifferent Lab Technicians Different processing timesDifferent Reagent LotsSequencingLane effect 6 Sample NameGroupBatchBatch*Treatment_r1KO11 Treatment_r2KO21 Treatment_r3KO11 Treatment_r4KO21 Cntrl_r1WT12 Cntrl_r2WT22 Cntrl_r3WT12 Cntrl_r4WT22* Confounded Groups and Batches! : OverviewNo need to reinvent the but there are a lot of wheels! Pre-alignment Quality-controlSequencing QualityContamination ScreeningPost-alignment Quality-controlAlignment QualityAggregation and InterpretationMultiQC ReportQC metric : Pre-alignmentSequencing QualityFastQC: run twice on raw and trimmed dataContamination Screening FastQ ScreenKrakenBioBloom8 Contamination ScreeningFastQC trimmedAdapter TrimmingFastQC : Pre-alignment9 FastQC (trimmed)FastQC (raw)Adapter TrimmingFastQCIdentify potential problems that can arise during sequencing or library prepRun on raw reads (pre-adapter removal) and trimmed reads (post-adapter removal)Summarizes.

4 - Per base and per sequence quality scores- Per sequence GC content- Per sequence adapter content- Per sequence read lengths- Overrepresented : : Pre-alignment11 AlignmentAdapter Trimming Contamination ScreenFastQ ScreenAligns to Human, Mouse, Fungi, Bacteria, Viral referencesEasy to interpret and important QC stepKrakenTaxonomic composition of microbial contamination- Archaea - Bacteria- Plasmid- ViralFastQ Screen Contamination Screening 12 Kraken + Krona Microbial Taxonomic : Post-alignment14 Quantify CountsAlignmentAlignment QualityPreseqEstimates library complexityPicard RNAseqMetricsNumber of reads that align to coding, intronic, UTR, intergenic, ribosomal regionsNormalize gene coverage across a meta-gene body- Identify 5 or 3 biasRSeQCSuite of tools to assess various post-alignment quality- Calculate distribution of Insert Size- Junction Annotation (% Known, % Novel read spanning splice junctions)- BAM to BigWig (Visual Inspection with IGV)CollectRnaseqMetrics Alignment Summary15 Picard CollectRnaseqMetrics Normalized Gene Coverage163.

5 Aggregation MultiQCHTML report that aggregates information across all samples - Plots, filtering, and highlightingHighly customizable with great documentation- Add text and embed custom figures- Create your own module to extend missing functionalitySupports over 73 commonly-used open source bioinformatics tools17QC Metric GuidelinesmRNAtotal RNARNA Type(s)CodingCoding + non-codingRIN> 8 [low RIN = 3 bias]> 8 Single-end vs Paired-endPaired-endPaired-endRecommende d Sequencing Depth10-20M PE reads25-60M PE readsFastQCQ30 > 70%Q30 > 70%Percent Aligned to Reference> 70%> 65%Million Reads Aligned Reference> 7M PE reads (or > 14M reads)> PE reads (or > 33M reads)Percent Aligned to rRNA< 5%< 15%Picard RNAseqMetricsCoding > 50%Coding > 35%Picard RNAseqMetricsIntronic + Intergenic < 25%Intronic + Intergenic < 40% Pipeline Conceptual Diagram18 Differential ExpressionSummarizing differences between two groups or conditions (KO vs.)

6 WT) QuantificationCounting the number of reads that align to particular feature of interest (genes, isoforms, etc)AlignmentAdding biological context to your data , find where reads align to the reference genomeAdapter TrimmingAdapters are composed of synthetic sequences and should be removed prior to alignmentRaw data FastQ filesFastQC: Pre- and post- trimmingCutadapt: Remove adaptersFastQ Screen: Run twice on different set of referencesSTAR: Splice-aware alignerRSEM: Generates gene and isoform countsMultiQC: Aggregates everything into an HTML report19 III. Processing Pipeline Practical ExampleRSEMC utadaptSTAR FastQ files to raw counts Pipeline: ReproducibilityWorkflow management systemsSnakemake, NextflowPackage management No active management: rat s nest of interdependencies prone to breakPython: virtual environmentsConda: Python, R, Scala, Java, C/C++, FORTRAND ocker or Singularity: Portability and high reproducibility Analysis Step 1: Think Step 2: Analyze Step 3: QC ?

7 ?? Step 4: Nobel Prize!Differential ExpressionQuantificationAlignmentRaw data FastQ filesAdapter TrimmingAnswer Biological QuestionsPrincipal Components Analysis (PCA) data summarization, visualization, and QC toolDifferential ExpressionFind genes that are different between groups of interestPathway EnrichmentAnalyze for broader biological Analysis21 Principal Components Analysis (PCA) Dimensionality reduction technique Captures patterns of variance into singular values Visualizes global transcriptomic Analysis: PCA22 Principal Components Analysis (PCA) Dimensionality reduction technique Captures patterns of variance into singular values Visualizes global transcriptomic Analysis: PCA22 PCA can help drive biological Analysis: PCA23 PCA can help drive biological Analysis: or be used as a QC Analysis: PCA24 Goal: Identify genes or transcripts that vary due to biological effectsQuestion: Can t I just use a t-test to do that?

8 Answer: Sure. But data are bad ideaSo we apply normalization and/or employ specialized statistical Analysis: Differential ExpressionLaw, C. W., et al. (2014). "voom: Precision weights unlock linear model analysis tools for RNA-seq read counts." Genome Biol 15(2): Analysis: Differential ExpressionSeyednasrollah, F., et al. (2015). "Comparison of software packages for detecting differential expression in RNA-seq studies." Brief Bioinform 16(1): Analysis: Differential ExpressionSeyednasrollah, F., et al. (2015). "Comparison of software packages for detecting differential expression in RNA-seq studies." Brief Bioinform 16(1): Rules of ThumbLimma, DESeq2, and EdgeR will work be very similarly in most cases- Consensus or intersection of the three is sometimes usedLimma works better with larger cohorts ( 7 or more samples per group)DESeq2 works better with small cohorts ( 3 or less per group)- May also be more sensitive for low depth dataEdgeR provides convenience functions for converting to various normalized Analysis: Differential Analysis: Differential Analysis: Differential Analysis.

9 Pathway EnrichmentGene annotation and network databases capture biological meaningManual curation, text miningGene function and/or interactionsDozens of databases and hundreds of toolsDepends on how you want to look at gene-pathway Analysis: Pathway EnrichmentTypes of pathway analysisSimple enrichment test: Qualitative- Fisher s Exact Test- Hypergeometric testEnrichment algorithms: Quantitative- GSEA (Broad Institute)Network AnalysisCommercial vs. open Analysis: Pathway EnrichmentTypes of pathway analysisSimple enrichment test: Qualitative- Fisher s Exact Test- Hypergeometric testEnrichment algorithms: Quantitative- GSEA (Broad Institute)Network AnalysisCommercial vs.

10 Open Analysis: Pathway EnrichmentTypes of pathway analysisSimple enrichment test: Qualitative- Fisher s Exact Test- Hypergeometric testEnrichment algorithms: Quantitative- GSEA (Broad Institute)Network AnalysisCommercial vs. open Analysis: Pathway EnrichmentTypes of pathway comparisons of pathway enrichmentHeatmapsVisualizing Set OverlapDotplotsSashimi plotsAlternative of RNA-seq Data35 Group comparison of pathway enrichment: Simple Enrichment : Group : Expression : Set : Pathway : Sashimi Plot40 Think BEFORE you sequence!This is a three-way partnership: bench sequencing analysis- Everyone should agree on experimental design, platform, approachQC is extremely important!


Related search queries