Introduction to NCBI BLAST
NCBI BLAST, the Basic Local Alignment Search Tool (BLAST) is a suite of programs designed to search all available sequence databases for similarities between a protein or DNA query and known sequences. BLAST allows quick matching of near and distant sequence relationships, providing scores that allow the user to distinguish real matches from background hits with a high degree of statistical accuracy.
Focusing on local alignments, BLAST uses a heuristic algorithm to detect relationships between sequences that may only share isolated regions of similarity. BLAST results take sequence length and the nucleotide/peptide compositions of the query into account when assigning alignment scores. For sequences shorter than 200 residues, an effective length is used to compensate for “edge effects”. Sequence alignment scores are reported by BLAST programs as E-values that reflect the strength of alignment between a given sequence in the database and a query. E-values are reported instead of the traditional P-value, to improve resolution between low scoring alignments, but for closely related sequences (P < 0.01), these values are nearly equal.
For more detailed information on how BLAST scores are calculated, visit:
http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html
For most first-time users of BLAST, choosing the right sub-program may be difficult. BLAST offers a variety of search tools for different types of queries. In general, the best choice of program depends upon the sequence length, the database being searched, and the information requested in the search.
Nucleotide BLAST is a collection of programs allowing users to compare a query sequence against other nucleotides in the database. BLAST accepts sequences in a variety of formats, including FASTA, GenBank, and Accession/GI numbers, and compares these with the NCBI databases. MEGABLAST is a concatenating algorithm for quickly aligning sequences longer than 28 residues. For shorter sequences, such as primers, standard nucleotide-nucleotide BLAST offers automatic parameter settings suited to these queries.
Protein BLAST is a collection of programs used to find protein sequences similar to a query. These programs accept sequences in the same file formats as Nucleotide BLAST. PSI-BLAST is a position specific, iterating algorithm that searches sequences from each round as the basis for scoring sequences searched in the next round. It distinguishes between highly and weakly conserved positions in the sequence, resulting in increased sensitivity with each iteration. PSI BLAST also offers the option of including regular expression patterns in the search, allowing users to identify sequences that include a pattern and are homologous to the query protein sequence. As with Nucleotide BLAST, Protein BLAST includes automatic parameter settings for shorter sequences.
Translating BLAST operates in a similar fashion to both the nucleotide and protein search routines. BLASTX translates nucleotide sequences into protein sequences in each of the 6 reading frames, prior to comparing the query to the protein databases. TBLASTN compares a protein sequence query against a database of nucleotide sequences previously translated in each of the 6 reading frames.
Users can refer to the NCBI BLAST program selection guide for more information:
http://www.ncbi.nlm.nih.gov/blast/producttable.shtml.
Users can access BLAST tools directly through the web, or through a variety of software applications, such as MiraiBio’s DNASIS SmartNote, which helps users find and organize sequences, and automatically submit them to the BLAST programs. DNASIS SmartNote has the additional ability to BLAST multiple sequences “in batch” without tediously copying/pasting each sequence and waiting for each result to come back.
To learn more about DNASIS SmartNote, visit http://smartnote.miraibio.com.


