Example: tourism industry

BLAST Basic Local Alignment Search Tool

BLAST Basic Local Alignment Search ToolBlast Program Selection GuideT able of Database Selection for the program choices given in Tables and for the program choices given in Tables on Special Purpose 1. IntroductionNCBI has provided BLAST sequence analysis services for over a decade. For many users, the first question they often face is"Which BLAST program should I use?" In order to help users arrive at an answer to this question, we created this "BLASTP rogram Selection Guide."This document first introduces the BLAST databases available from NCBI (in Section 2). The actual guide (Section 3) dividesBLAST searches into several categories according to the nature and size of the input query and the primary goal of the from the query sequence column on the left and cross-referencing to the right, a user will arrive at the specific BLAST program(s) best suited f

ESTs Single pass sequence reads from cDNA libraries. This database is updated daily. BAC ends The end sequences of BAC clones. This database is generated daily Traces-WGS All of the raw organism WGS traces. This database is updated as needed. Traces-ESTs All of the raw organism EST traces. This database is updated as needed.

Tags:

  Basics, Search, Tool, Blast, Local, Alignment, Ancd, Blast basic local alignment search tool

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of BLAST Basic Local Alignment Search Tool

1 BLAST Basic Local Alignment Search ToolBlast Program Selection GuideT able of Database Selection for the program choices given in Tables and for the program choices given in Tables on Special Purpose 1. IntroductionNCBI has provided BLAST sequence analysis services for over a decade. For many users, the first question they often face is"Which BLAST program should I use?" In order to help users arrive at an answer to this question, we created this "BLASTP rogram Selection Guide."This document first introduces the BLAST databases available from NCBI (in Section 2). The actual guide (Section 3) dividesBLAST searches into several categories according to the nature and size of the input query and the primary goal of the from the query sequence column on the left and cross-referencing to the right, a user will arrive at the specific BLAST program(s) best suited for that document is also available in PDF (163,516 bytes).

2 2. BLAST Database Content A BLAST Search has four components: query, database, program, and Search purpose/goal. To discuss effective BLAST program selection, we first need to know what databases are available and what sequences these databases contain. In thissection, we will first take a look at the common BLAST databases. According to their content, they are grouped into nucleotideand protein databases. These databases and their detailed compositions are listed in the two tables also provides specialized BLAST databases such as the vector screening database, variety of genome databases fordifferent organisms, and trace databases.

3 The contents for the three important model organisms, , human, mouse, and rat,are described in Table For other organisms, the content of their genome BLAST pages will be listed when these specialBLAST pages are discussed. Table Content of Protein Sequence DatabasesDatabase Content DescriptionnrNon-redundant GenBank CDS translations + PDB + SwissProt + PIR + PRF, excluding those in sequences from NCBI Reference Sequence major release of the SWISS-PROT protein sequence database (no incremental updates).patProteins from the Patent division of new or revised GenBank CDS translations + PDB + SwissProt + PIR + PRF released in the last derived from the 3-dimensional structure records from the Protein Data CDS translations from env_nt 663 PSSMs from Smart, no longer actively 7255 PSSMs from Pfam, not the Program Selection of 201/7/2009 5:05 PMCOG 4873 PSSMs from NCBI COG 4825 PSSMs from NCBI KOG set (eukaryotic COG equivalent).

4 CDD 11399 PSSMs from NCBI curated cd : default database is in bold. These databases are searchable only from rpsblast page, actual version may vary.[Back to top] Table Nucleotide Databases for BLASTD atabaseContent Descriptionnr All GenBank + EMBL + DDBJ + PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences). No longer "non-redundant" due to computational sequences from NCBI Reference Sequence sequences from NCBI Reference Sequence of GenBank + EMBL + DDBJ sequences from EST subset of subset of of est other than human or Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu High Throughput Genomic Sequences: phases 0, 1 and 2.

5 Finished, phase 3 HTG sequencesare in from the Patent division of derived from the 3-dimensional structure records from Protein Data Bank. They are NOT thecoding sequences for the coresponding proteins found in the same PDB new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences. See "Alualert" by Claverie and Makalowski, Nature 371: 752 (1994).dbstsDatabase of Sequence Tag Site entries from the STS division of GenBank + EMBL + genomes and complete chromosomes from the NCBI Reference Sequence project.

6 It overlapswith of Whole Genome Shotgun from environmental samples, such as uncultured bacterial samples isolated from soil or marinesamples. The largest single source is Sagarsso Sea project. This does NOT overlap with nucleotide : default database is in bold.[Back to top] Table Genome BLAST Databases and Contents Database Descriptiongenome (allassemblies)*This database represents the current public build of the genome. The sequences in this database will haveRefSeq accession numbers or type NT_?????? or NW_?????? and these represent either contigs (from aclone based assembly) or supercontigs (from a whole genome shotgun or composite assembly).

7 Thecontigs in this database are from both the reference assembly and any alternate assemblies available forthe genome. This database is generated at the time of a genome Program Selection of 201/7/2009 5:05 PMgenome(reference only)This database represents the current public build of the genome. The sequences in this database will haveRefSeq accession numbers or type NT_?????? or NW_?????? and these represent either contigs (from aclone based assembly) or supercontigs (from a whole genome shotgun or composite assembly). Thecontigs in this database are from only the reference assembly.

8 This database is generated at the time of agenome databases is a collection of all sequences in GenBank that have an HTG keyword. This allows usersto Search htgs_phase3 sequences (normally found in NR) and htgs_phase0, 1 and 2 sequences (normallyfound in HTGS) at the same timeRefSeq RNAC ollection of reference mRNAs generated by the NCBI RefSeq project. This database is generated proteinCollection of reference proteins generated by the NCBI RefSeq database is generated dailyBuild RNAC ollection of reference mRNAs generated by NCBI as part of the genome annotation pipeline. Thisdatabase is generated at the time of a genome proteinCollection of reference proteins generated by NCBI as part of the genome annotation pipeline.

9 Thisdatabase is generated at the time of a genome Initio RNAC ollection of ab initio RNA predictions generated by NCBI as part of the genome annotation pipeline. Thisdatabase is generated at the time of a genome Initio proteinCollection of ab initio protein predictions generated by NCBI as part of the genome annotation database is generated at the time of a genome pass sequence reads from cDNA libraries. This database is updated endsThe end sequences of BAC clones. This database is generated dailyTraces-WGSAll of the raw organism WGS traces. This database is updated as of the raw organism EST traces.

10 This database is updated as of the raw organism non-WGS and non-EST traces. This database is updated as contigsIf an organism was assembled using a whole genome shotgun (WGS) strategy, this database is available(if the WGS assembly is in GenBank). This database is updated as TrapClones (MouseOnly)A collection of sequences generated by performing Gene Trap insertions. This database is DogAssembly (boxer)The supercontigs from the Whole Genome Shotgun (WGS) assembly from a coverage whole genomelibrary. This assembly was performed at the Broad Institute using the Arachne assemblerCelera DogAssembly(Poodle)This database is a collection of the Whole Genome Shotgun (WGS) contigs assembled from a whole genome library.


Related search queries