EST clustering

Exercise 1: Cleaning sequences

A little cluster

Here are 4 mouse sequences that you will analyze and clean. Copy them in a local file (you will need to manually edit them).

Using LALIGN check manually if the 4 sequences could be in the same cluster.

Do a similar, but automated clustering with CAP.


Do a vector clipping:

Repeat masking

Now try to "re-cluster" the cleaned sequences again using LALIGN or CAP. (select the fastest solution...)

What can you say?

Genomic Mapping

Cluster 7 rat sequences from Unigene Rn.43270 using CAP and map the contig onto the genomic sequence (AC094146) with Spidey or Sim4.

What can you say? What about the mouse ortholog? (U20225)

Exercise 2: Gene Indices


Go to the Unigene web page.

You can search the Gene Index by keywords.

Have a look in details to cluster Hs.69547 (don't hesitate to click around, there is a lot of information around!)

You can also browse the different EST libraries by clicking Library Browser in the Human Gene Index home page.

Digital Differential Display (DDD) allow you to do in silico differential gene expression analysis.

TIGR Gene Indices

Enter the TIGR Gene Indices web page.

  • Follow the Gene Product Name link to search the gene index by keyword.
  • Use the keyword 'myelin' for the search.
  • Look in more details the report for THC1093817.

    Look at the Functional Classification based on the Gene Ontology Assignments. All TIGR Gene Indices are classified using the Gene Ontology vocabulary.