HMMs and Profiles

Slide presentation

The slide presentation about HMMs and Profiles in pdf format (2 slides per page).


Today we will build a protein profile (HMM and generalized profile) starting from scratch.

The first thing to do is to recover the sequence we want to analyze. We will use the MLTD_ECOLI protein from E. coli.

Copy the sequence in your directory:

A first look at the protein

Use Dotlet to look at the protein. Compare your protein against itself.

PSI-BLAST search

Now that you have an idea about the regions of the protein you can use each one of the regions for a PSI-BLAST search.

Recover the PSI-BLAST results

We now recover all sequences from the PSI-BLAST results with an e-value lower than a specified threshold.

To avoid doing this by hand (... quite annoying!) use this little script

Build a multiple alignment

To build the multiple alignment we will use clustalw (some documentation here).

Build the profile

Open the alignment corresponding to the N-term region of your protein with jalview and redefine the N-term and C-term regions of the alignment (use the "Remove sequences" left and right in the Edit menu).

Save the new alignment in MSF format as 'profile.msf'.

Now we will build the profile using HMMER2 package

Rebuild an alignment

We can use the sequences found by hmmsearch to rebuild a profile.

Retrieve the sequences from the file hmmsearch.output using the script hmmsearch.output > profile2.fasta

Align the sequences to the existing profile:

hmmalign -q profile.hmm profile2.fasta | selex2f > profile2.aln

Ouff! You deserve some freedom ... yeah, LUNCH TIME!