Example: tourism industry

Rsubread/Subread Users Guide - Bioconductor

Rsubread/Subread Users GuideRsubread October 2021 Wei Shi and Yang LiaoOlivia Newton-John Cancer Research InstituteMelbourne, AustraliaCopyright 2011 - 2021 Contents1 Introduction32 Citation .. Download and installation .. BioconductorRsubreadpackage .. SourceForgeSubreadpackage .. How to get help ..73 The seed-and-vote mapping Seed-and-vote .. Detection of short indels .. Detection of exon-exon junctions .. Detection of structural variants (SVs) .. Two-scan read alignment .. Multi-mapping reads .. Mapping of paired-end reads ..124 Mapping reads generated by genomic DNA sequencing A quick start for using SourceForgeSubreadpackage .. A quick start for using BioconductorRsubreadpackage .. Index building.

The Subread/Rsubread packages comprise a suite of high-performance software programs for processing next-generation sequencing data. Included in these packages are Subread aligner, Subjunc aligner, Sublong long-read aligner, Subindel long indel detection program,

Tags:

  Guide, User, Users guide, Rsubread

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Rsubread/Subread Users Guide - Bioconductor

1 Rsubread/Subread Users GuideRsubread October 2021 Wei Shi and Yang LiaoOlivia Newton-John Cancer Research InstituteMelbourne, AustraliaCopyright 2011 - 2021 Contents1 Introduction32 Citation .. Download and installation .. BioconductorRsubreadpackage .. SourceForgeSubreadpackage .. How to get help ..73 The seed-and-vote mapping Seed-and-vote .. Detection of short indels .. Detection of exon-exon junctions .. Detection of structural variants (SVs) .. Two-scan read alignment .. Multi-mapping reads .. Mapping of paired-end reads ..124 Mapping reads generated by genomic DNA sequencing A quick start for using SourceForgeSubreadpackage .. A quick start for using BioconductorRsubreadpackage .. Index building.

2 Read mapping .. Memory use and speed .. Mapping quality scores .. Mapping output .. Mapping of long reads ..255 Mapping reads generated by RNA sequencing A quick start for using SourceForgeSubreadpackage .. A quick start for using BioconductorRsubreadpackage .. Index building .. Local read alignment .. Global read alignment .. Memory use and speed .. Mapping output .. Mapping microRNA sequencing reads (miRNA-seq) ..296 Read Introduction .. featureCounts .. data .. format .. annotations .. and paired-end reads .. reads to features and meta-features .. multi-mapping reads and multi-overlapping reads .. filtering .. manipulation .. output .. Program usage .. A quick start forfeatureCountsin SourceForgeSubread.

3 A quick start forfeatureCountsin BioconductorRsubread..467 Quantify single-cell RNA-seq cellCounts ..478 SNP Algorithm .. exactSNP ..519 Utility repair .. flattenGTF .. promoterRegions .. propmapped .. qualityScores .. removeDup .. subread-fullscan .. txUnique ..5510 Case A Bioconductor R pipeline for analyzing RNA-seq data ..562 Chapter 1 IntroductionThe Subread/ rsubread packages comprise a suite of high-performance software programsfor processing next-generation sequencing data. Included in these packages areSubreadaligner,Subjuncaligner,Sublong long-read aligner,Subindellong indel detection program,featureCountsread quantification program,exactSNPSNP calling program and other utilityprograms.

4 This document provides a detailed description to the programs included in adopt a mapping paradigm called seed-and-vote [1]. Thisis an elegantly simple multi-seed strategy for mapping reads to a reference genome. Thisstrategy chooses the mapped genomic location for the read directly from the seeds. It uses arelatively large number of short seeds (called subreads) extracted from each read and allowsall the seeds to vote on the optimal location. When the read length is<160 bp, overlappingsubreads are used. More conventional alignment algorithms are then used to fill in detailedmismatch and indel information between the subreads that make up the winning voting strategy is fast because the overall genomic location has already been chosen before thedetailed alignment is done.

5 It is sensitive because no individual subread is required to mapexactly, nor are individual subreads constrained to map close by other subreads. It is accuratebecause the final location must be supported by several different subreads. The strategyextends easily to find exon junctions, by locating reads that contain sets of subreads mappingto different exons of the same gene. It scales up efficiently for longer a general-purpose read aligner. It can be used to align reads generated fromboth genomic DNA sequencing and RNA sequencing technologies. It has been successfullyused in a number of high-profile studies [2, 3, 4, 5, 6].Subjuncis specifically designed to detectexon-exon junctions and to perform full alignments for RNA-seq reads. Note thatSubreadperforms local alignments for RNA-seq reads, whereasSubjuncperforms global alignments forRNA-seq a read re-alignment step in which reads arere-aligned using genomic variation data and junction data collected from the initial carries out local read assembly to discover long insertions anddeletions.

6 Read mapping should be performed before running this is designed to assign mapped reads or fragments (paired-enddata) to genomic features such as genes, exons and promoters. It is a light-weight read counting3program suitable for count both gDNA-seq and RNA-seq reads for genomic features[7]. TheSubread-featureCounts-limma/voompipel ine has been found to be one of the best-performingpipelines for the analyses of RNA-seq data by the SEquencing Quality Control (SEQC) study,the third stage of the well-known MicroArray Quality Control (MAQC) project [8].Also included in this software suite is a very efficient SNP caller local background noise for each candidate SNP and then uses that information toaccurately call software programs support a variety of sequencing platforms. They are released intwo packages SourceForgeSubreadpackage and BioconductorRsubreadpackage[9].

7 4 Chapter CitationIf you useRsubread, you can cite:Liao Y, Smyth GK and Shi W (2019). The R package rsubread is easier, faster,cheaper and better for alignment and quantification of RNA sequencing Acids Research, 47(8) you usefeatureCounts, you can cite:Liao Y, Smyth GK and Shi W (2014). featureCounts: an efficient general pur-pose program for assigning sequence reads to genomic ,30(7) you useSubreadorSubjuncaligners, you can cite:Liao Y, Smyth GK and Shi W (2013). The Subread aligner: fast, accurate andscalable read mapping by Acids Research, 41(10) Download and Install BioconductorRsubreadpackageRsoftware needs to be installed on my computer before you can install this package. LaunchRand issue the following command to installRsubread:if (!requireNamespace("BiocManager", quietly = TRUE)) ("BiocManager")BiocManager::install("Rsu bread")Alternatively you may download it fromRsubreadweb install it Install SourceForgeSubreadpackageInstall from a binary distributionThis is the easiest way to install the SourceForgeSubreadpackage.

8 Binary distributions areavailable for Linux, Macintosh and Windows operating systems and they can be The Linux binary distribution can be run on mul-tiple Linux variants including Debian, Ubuntu, Fedora and Cent installSubreadpackage on FreeBSD or Solaris, you will have to install from from source on a Unix or Macintosh computerDownloadSubreadsource package to your working directory from , and type the following command to uncompress it:tar zxvf of the package and issue the following command to install it on a Linuxoperating system:make -f install it on a Mac OS X operating system, issue the following command:make -f install it on a FreeBSD operating system, issue the following command:make -f install it on Oracle Solaris or OpenSolaris computer operating systems, issue the fol-lowing command:make -f new directory calledbinwill be created under the home directory of the software package,and the executables generated from the compilation are saved to that directory.

9 To enableeasy access to these executables, you may copy them to a system directory such as/usr/binor add the path to them to your search path (your search path is usually specified in theenvironment variable PATH ).Install from source on a Windows computerThe MinGW software tool ( ) needs to installed to compile How to get helpBioconductor support site ( ) or Google Subread group( #!forum/subread) are the best place to post questionsor make 3 The seed-and-vote mapping Seed-and-voteWe have developed a new read mapping paradigm called seed-and-vote for efficient, accurateand scalable read mapping [1]. The seed-and-vote strategy uses a number of overlapping seedsfrom each read, calledsubreads. Instead of trying to pick the best seed, the strategy allowsall the seeds to vote on the optimal location for the read.

10 The algorithm then uses moreconventional alignment algorithms to fill in detailed mismatch and indel information betweenthe subreads that make up the winning voting block. The following figure illustrates theproposed seed-and-vote mapping approach with an toy aligners have been developed under the seed-and-vote paradigm, a general-purpose read aligner, which can be used to map bothgenomic DNA-seq and RNA-seq read data. Its running time is determined by the number ofsubreadsextracted from each read, not by the read length. Thus it has an excellent mapingscalability, ie. its running time has only very modest increase with the increase of read the largest mappable region in the read to determine its mapping location,therefore it automatically determines whether a global alignment or a local alignment shouldbe found for the read.


Related search queries