STAR manual 2.7 - Cornell University

star manual 23, 2019 Contents1 Getting Installation.. - in depth and troubleshooting.. Basic workflow..42 Generating genome Basic options.. Advanced options.. chromosomes/scaffolds/patches to include? .. annotations to use? .. in GFF format .. a list of annotated junctions.. small genome.. with a large number of references..73 Running mapping Basic options.. Advanced options.. annotations at the mapping stage.. options .. Using shared memory for the genome indexes..94 Output Log files.. SAM.. attributes.. with Cufflinks/Cuffdiff.. Unsorted and sorted-by-coordinate BAM.. Splice junctions.

1215 Chimeric and circular star -Fusion.. Chimeric alignments in the main BAM files.. Chimeric alignments .. Chimeric alignments ..136 Output in transcript Counting number of reads per 2-pass Multi-sample 2-pass mapping.. Per-sample 2-pass mapping.. 2-pass mapping with re-generated genome..179 Merging and mapping of overlapping paired-end Detection of personal variants overlapping WASP filtering of allele specific Detection of multimapping star solo: mapping, demultiplexing and gene quantification for single cell RNA-seq1814 Description of all Parameter Files .. System .. Run Parameters .. Genome Parameters.

Genome Indexing Parameters - only used with runMode genomeGenerate .. Splice Junctions Database .. Variation parameters .. Input Files .. Read Parameters .. : general .. : SAM and BAM .. processing .. Wiggle .. Filtering .. Filtering: Splice Junctions .. and Seeding .. reads: presently unsupported/undocumented .. , Anchors, Binning .. Alignments .. of Annotations .. Mapping .. parameters .. (single cell RNA-seq) parameters ..4831 Getting source code and binaries can be downloaded from GitHub: named releases , or the master branch The pre-compiled star executables are locatedbin/subdirectory.

Thestaticexecutables are the easisest to use, as they are statically compiled and are not dependents on compile star from sources runmakein the source directory for a Linux-like environment,or runmake STARforMacfor Mac OS X. This will produce the executable star inside the Installation - in depth and is compiled with gcc c++ compiler and depends only on standard gcc libraries. Some genericinstructions on installing correct gcc environments are given $ sudo apt-get update$ sudo apt-get install g++$ sudo apt-get install makeRed Hat, CentOS, Fedora.$ sudo yum update$ sudo yum install make$ sudo yum install gcc-c++$ sudo yum install glibc-staticSUSE.

$ sudo zypper update$ sudo zypper in gcc gcc-c++Mac OS versions of Mac OS X Xcode are shipped with Clang replacing the standard gcc , standard Clang does not support OpenMP which creates problems for star option to avoid this problem is to install gcc (preferably usinghomebrewpackage manager).Another option is to add OpenMP functionality to Basic star workflow consists of 2 steps:1. Generating genome indexes files (see Section 2. Generating genome this step user supplied the reference genome sequences (FASTA files) and annota-tions (GTF file), from which star generate genome indexes that are utilized in the42nd (mapping) step.)

The genome indexes are saved to disk and need only be generatedoncefor each genome/annotation limited collection of star genomesis available , however, it is strongly recommended that users generate their own genomeindexes with most up-to-date assemblies and Mapping reads to the genome (see Section 3. Running mapping jobs).In this step user supplies the genome files generated in the 1st step, as well as the RNA-seqreads (sequences) in the form of FASTA or FASTQ files. star maps the reads to the genome,and writes several output files, such as alignments (SAM/BAM), mapping summary statistics,splice junctions, unmapped reads, signal (wiggle) tracks etc.

Output files are described inSection 4. Output files. Mapping is controlled by a variety of input parameters (options) thatare described in brief in Section 3. Running mapping jobs, and in more detail in Section of all command line has the following format : star --option1-name option1-value(s)--option2-name option2-value(s) ..If an option can accept multiple values, they are separated by spaces, and in a few cases - by Generating genome Basic basic options to generate genome indices are as follows:--runThreadNNumberOfThreads--run Mode genomeGenerate--genomeDir/path/to/genome Dir--genomeFastaFiles/path/to/genome/fas ta1 /path/to/genome/fasta2.

--sjdbGTFfile/path/ defines the number of threads to be used for genome generation, it hasto be set to the number of available cores on the server genomeGenerateoption directs star to run genome indices generation path to the directory (henceforth called genome directory where thegenome indices are stored. This directory has to be created (withmkdir) before star runand needs to have writing permissions. The file system needs to have at least 100GB of diskspace available for a typical mammalian genome. It is recommended to remove all files fromthe genome directory before running the genome generation step. This directory path will haveto be supplied at the mapping step to identify the reference one or more FASTA files with the genome reference reference sequences (henceforth called chromosomes ) are allowed for each fasta can rename the chromosomes names in the keeping the order of the chromo-somes in the file: the names from this file will be used in all output alignment files (such ).)

The tabs are not allowed in chromosomes names, and spaces are not the path to the file with annotated transcripts in the standard GTFformat. star will extract splice junctions from this file and use them to greatly improveaccuracy of the mapping. While this is optional, and star can be run without annotations,using annotations ishighly recommendedwhenever they are available. Starting from ,the annotations can also be included on the fly at the mapping the length of the genomic sequence around the annotated junctionto be used in constructing the splice junctions database. Ideally, this length should be equalto theReadLength-1, whereReadLengthis the length of the reads.

For instance, for Illumina2x100b paired-end reads, the ideal value is 100-1=99. In case of reads of varying length, theideal value ismax(ReadLength) most cases, the default value of 100 will work aswell as the ideal files comprise binary genome sequence, suffix arrays, text chromosome names/lengths,splice junctions coordinates, and transcripts/genes information. Most of these files use internalSTAR format and are not intended to be utilized by the end user. It is stronglynot recommendedto change any of these file with one exception: you can rename the chromosome names in keeping the order of the chromosomes in the file: the names from this file will be usedin all output files ( SAM/BAM).

STAR manual 2.7 - Cornell University

Tags:

Information

Transcription of STAR manual 2.7 - Cornell University

Related search queries

STAR manual 2.7 - Cornell University

Tags:

Information

Documents from same domain

Related documents

Related search queries