Identification of Known Domain in a Protein Sequence
What can you predict about this sequence
MGIQGLAKLI ADVAPSAIRE NDIKSYFGRK VAIDASMSIY QFLIAVRQGG DVLQNEEGET TSHLMGMFYR TIRMMENGIK PVYVFDGKPP QLKSGELAKR SERRAEAEKQ LQQAQAAGAE QEVEKFTKRL VKVTKQHNDE CKHLLSLMGI PYLDAPSEAE ASCAALVKAG KVYAAATEDM DCLTFGSPVL MRHLTASEAK KLPIQEFHLS RILQELGLNQ EQFVDLCILL GSDYCESIRG IGPKRAVDLI QKHKSIEEIV RRLDPNKYPV PENWLHKEAH QLFLEPEVLD PESVELKWSE PNEEELIKFM CGEKQFSEER IRSGVKRLSK SRQGSTQGRL DDFFKVTGSL SSAKRKEPEP KGSTKKKAKT GAAGKFKRGKusing the following Motif-Scan servers
How much different from each other are these predictions? Which server do you prefer? Why?
Retrieve the SwissProt entry that corresponds to the above sequence and observe how the predictions of the different Motif-Scan servers are incorporated into the annotations of the Swiss-Prot entry.
Protein Classification based on Domain Architecture
Using the Hits protein workbench, retrieve all proteins in SwissProt that contains a match by the Prosite profiles 53EXO_N_DOMAIN and 53EXO_I_DOMAIN:
Try to regroup these proteins into a few families by looking at their domain architecture. The most useful tools for this purpose is probably the sequence element viewer SEView. Two keys are pretty usefull in establishing the classification:
Does your classification reflect the ID given by the SwissProt annotators? As this task is pretty time consumming, you can get a pre-established classification here but note that some domain definition have been altered. But, get an idea by yourself before looking at this page.
This two other links might help you sorting out the different type of domains architecture found in these proteins. You must however supply one of the sequence as query to launch them
The previous exercice provides you with some knowledge of the names and domains architecture of a group of related proteins. One will now exploite this knowledge to observe the behaviour of PSI-blast. You can either use the NCBI web interface or our still experimental web interface to begin the exercise.
Limit your search to the SwissProt database. At each cycle, record the E-value produced by the protein XPG_XENLA, FEN1_HUMAN, DIN7_YEAST. Represent these E-value in a table (3 cycles vs 3 proteins) and try to explain whait you observe.
as query and look at the E-value produced by DPO1_ECOLI and EX9_ECOLI.
What Multiple Sequence Alignment
The alignment of the two sequences below was deduced from the structures of two gluthathione S-transferases. The spatial coordinates of the alpha-carbon atoms of both crystal structures were taken into account to produce the alignment. Actually, the nature of each amino acids was not considered.
Paste this alignment into the query form of our still experimental web interface and launch a search against SwissProt while restraining the taxonomic range to Bacteria.
In a second browser window, paste the alignment into the text area of the MSA hub. Re-align the sequences using a sequence-based method like Clustal-W. Then upload the resulting re-aligned sequences into the PSI-blast query form and lauch the search against SwissProt while restraining the taxonomic range to Bacteria.
Compare the two PSI-blast output. Use the "more about the selected proteins" button to verify with SEView in the Protein Hub that all matched proteins are actually related to glutathione S-transferases. Indeed, you can be quite confident about the prediction by the Pfam and Prosite predictors.
Do one more cycle of PSI-BLAST with each of the window. Does this confirm your preliminary observations.