Protein Domains and PSI-blast

  1. Identification of Known Domain in a Protein Sequence

    What can you predict about this sequence

         MGIQGLAKLI ADVAPSAIRE NDIKSYFGRK VAIDASMSIY QFLIAVRQGG DVLQNEEGET     TSHLMGMFYR TIRMMENGIK PVYVFDGKPP QLKSGELAKR SERRAEAEKQ LQQAQAAGAE     QEVEKFTKRL VKVTKQHNDE CKHLLSLMGI PYLDAPSEAE ASCAALVKAG KVYAAATEDM     DCLTFGSPVL MRHLTASEAK KLPIQEFHLS RILQELGLNQ EQFVDLCILL GSDYCESIRG     IGPKRAVDLI QKHKSIEEIV RRLDPNKYPV PENWLHKEAH QLFLEPEVLD PESVELKWSE     PNEEELIKFM CGEKQFSEER IRSGVKRLSK SRQGSTQGRL DDFFKVTGSL SSAKRKEPEP     KGSTKKKAKT GAAGKFKRGK	    
    using the following Motif-Scan servers

    How much different from each other are these predictions? Which server do you prefer? Why?

    Retrieve the SwissProt entry that corresponds to the above sequence and observe how the predictions of the different Motif-Scan servers are incorporated into the annotations of the Swiss-Prot entry.

  2. Protein Classification based on Domain Architecture

    Using the Hits protein workbench, retrieve all proteins in SwissProt that contains a match by the Prosite profiles 53EXO_N_DOMAIN and 53EXO_I_DOMAIN:

    Try to regroup these proteins into a few families by looking at their domain architecture. The most useful tools for this purpose is probably the sequence element viewer SEView. Two keys are pretty usefull in establishing the classification:

    Does your classification reflect the ID given by the SwissProt annotators? As this task is pretty time consumming, you can get a pre-established classification here but note that some domain definition have been altered. But, get an idea by yourself before looking at this page.

    This two other links might help you sorting out the different type of domains architecture found in these proteins. You must however supply one of the sequence as query to launch them

  3. PSI-blast Iteration

    The previous exercice provides you with some knowledge of the names and domains architecture of a group of related proteins. One will now exploite this knowledge to observe the behaviour of PSI-blast. You can either use the NCBI web interface or our still experimental web interface to begin the exercise.

  4. What Multiple Sequence Alignment

    The alignment of the two sequences below was deduced from the structures of two gluthathione S-transferases. The spatial coordinates of the alpha-carbon atoms of both crystal structures were taken into account to produce the alignment. Actually, the nature of each amino acids was not considered.

    >1gul/1-217RPKLHYPNGRGRMESVRWVLAAAGVEFDEEFLET-KEQLYKLQDGNHLLFQQVPMVEIDGMKLVQTRSILHYIADKH----NLFGKNLKERTLIDMYVEGT----LDLLELLIMHPF----LKPDDQQKEVVNMAQKAIIRYFPVFEKILRGHGQSFLVGNQLSLADVILLQTILALEEKIPNILSAFPFLQEYTVKLSNIPTIKRFLEPGSKKKPPPDEIYVRTVYNIF>1ljr/1-244GLELFLDLVSQPSRAVYIFAKKNGIPLELRTVDLVKGQHKSKEFLQINSLGKLPTLKDGDFILTESSAILIYLSCKYQTPDHWYPSDLQARARVHEYLGWHADCIRGTFGIPLWVQVLGPLIGVQVPEEKVERNRTAMDQ-ALQWLEDKFLG-DRPFLAGQQVTLADLMALEELMQPVALGYELFEGRPRLAAWRGRVEAFLGAELCQEAHSIILSILEQAAKKTLPTPS	  

    Paste this alignment into the query form of our still experimental web interface and launch a search against SwissProt while restraining the taxonomic range to Bacteria.

    In a second browser window, paste the alignment into the text area of the MSA hub. Re-align the sequences using a sequence-based method like Clustal-W. Then upload the resulting re-aligned sequences into the PSI-blast query form and lauch the search against SwissProt while restraining the taxonomic range to Bacteria.

    Compare the two PSI-blast output. Use the "more about the selected proteins" button to verify with SEView in the Protein Hub that all matched proteins are actually related to glutathione S-transferases. Indeed, you can be quite confident about the prediction by the Pfam and Prosite predictors.

    Do one more cycle of PSI-BLAST with each of the window. Does this confirm your preliminary observations.


Marco Pagni
Last modified: Fri Aug 31 14:05:44 CEST 2001