Database search with Blast: Hints
You will find below some hints for the EMBnet course on database search with blast.
Database search with Blastp - Hints
- First you will find some information about the last update to the different databases.
This is followed by information about the blast program, the query sequence and the searched database.
Then you will find a graphical summary of the matches, the one line descriptions of the matches with a link to the sequence,the score and the e-value.
Below you will have the details of each alignment, followed by some statistics about the search. - You can either look at the beginning or at the end of the results to find the information about the size of the database.
- The score in bits is similar to the raw score of an alignment, but it has been rendered independent of the scoring system. Therefore it can be usedto compare different alignments (see also Altschul's description of bit scores).
The e-value (Expect value) describes the number of hits one can expect to see just by chance when searching a database of a particular size. - Homology means that the proteins are related by the evolutionary process of divergence from a common ancestor.
It is often quite difficult to ascertain homology of two sequences based only on sequence comparison. There are other factors that must be considered, likestructure, function, genomic localisation, expression and so on.
Note: Homology is not the same as similarity !
Similarity can be quantified (i.e. "two sequences are 95% identical").
Homology is an absolute statement, either yes or no.
- As described in the pairwise comparison hints page, the blosum90 matrix is more stringent concerning related amino acidsthen blosum45.
Roughly said, using blosum90 will increase selectivity, that is it will exclude more of the weak falsematches at the risk of missing some weak true matches.
Using blosum45 will increase sensitivity, that is it will include more (all) of the true matches but decrease selectivity by including some false ones..
- See the gap opening and extension questions on the pairwise comparison hints page.
- Low complexity regions inside biological sequences can produce significative alignments to other sequences with low complexity regions, but which are not relatedto each other. The low complexity filter will try to identify these regions and mask them so as to exclude them from the alignment computation.
Database search with Blastx - Hints
- blastx compares a DNA sequence translated into its 6 reading frames to a protein database.
blastp compares a protein sequence to a protein database.
blastn compares a nucleotide sequence to a nucleotide database. - The EST contains a coding sequence in the +1 reading frame.
- The EST is biased towards the C-terminus of the protein, this is quite frequent because the methodology to produce ESTs favourssequences located at the 3' end of the RNA transcript.
- The similar regions are located at the beginning of the EST, but more or less at the end of the protein. This seems to be a MAGE domain.
- See the low complexity filter questions in the previous exercise.