HMMs and Profiles


Slide presentation

The slide presentation about HMMs and Profiles in pdf format (2 slides per page).


Exercise

Today we will build a protein profile (HMM and generalized profile) starting from scratch.

The first thing to do is to recover the sequence we want to analyze. We will use the MLTD_ECOLI protein from E. coli.

Copy the sequence in your directory:


A first look at the protein

Use Dotlet to look at the protein. Compare your protein against itself.


PSI-BLAST search

Now that you have an idea about the regions of the protein you can use each one of the regions for a PSI-BLAST search.


Recover the PSI-BLAST results

We now recover all sequences from the PSI-BLAST results with an e-value lower than a specified threshold.

To avoid doing this by hand (... quite annoying!) use this little script extract_psi_seq.pl.


Build a multiple alignment

To build the multiple alignment we will use clustalw (some documentation here).


Build the profile

Open the alignment corresponding to the N-term region of your protein with jalview and redefine the N-term and C-term regions of the alignment (use the "Remove sequences" left and right in the Edit menu).

Save the new alignment in MSF format as 'profile.msf'.

Now we will build the profile using HMMER2 package


Rebuild an alignment

We can use the sequences found by hmmsearch to rebuild a profile.

Retrieve the sequences from the file hmmsearch.output using the script hmmsearch_get_seq.pl:

hmmsearch_get_seq.pl hmmsearch.output > profile2.fasta

Align the sequences to the existing profile:

hmmalign -q profile.hmm profile2.fasta | selex2f > profile2.aln


Ouff! You deserve some freedom ... yeah, LUNCH TIME!