EST clustering

Exercise 1: Cleaning sequences

A little clustering

Without any cleaning, we can suppose that the sequences are in the same cluster:

Vector clipping

Seq2 is contaminated by Phage M13 vector at his 3' end. Remove contamination sequence:

>seq2TATAAATACAAATACGTATACATGTCTATTATAATGAAAAATTGCCAATCTTGTTTAAGCAAATGCATTCTATCGTTATTATAAATGTTAGTTCTAGCTTTATTTACTTCAAAATCTTAAATCAGAATAAATTAATATTGTATTGCTGCTGTGCGTGGAAAAAGATGATGTTTATGTTCTTATAGAATAAAAGCTGTGGTTNTTTATTGTCTGTCTCCTCCACTAGANTGTAAGCTCCATGAGGGCAGGGATTTTGTCTGTYTTGTTCACTGCTGTATCCCCAGCGCCTAGAACAGTGCCTGGCACATAGTAGGCGCTCAATAAATATTTGTTGAATGAATGAATGAATGAATACGCTACTACTATTAGTAGAATTGATGCCACCTTTTCA

Seq3 is contaminated by Phage M13 vector at his 3' end. Remove contamination sequence:

>seq3TGGCACTCACCCATGGCAGCACAGGATGCCATCTTCTTTGAGTGCTGTCGTAATGAGCTGGATTNTTTATTGTCTGTCTCCTCCACTAGANTGTAAGCTCCATGAGGGCAGGGATTTTGTCTGTYTTGTTCACTGCTGTATCCCCAGCGCCTAGAACAGTGCCTGGCACATAGTAGGCGCTCAATAAATATTTGTTGAATGAATGAATGAATAATAGGACTCCTACGCTACTACTATTAGTAGAATTGATGCCACCTTTTCA

Repeat masking

Seq2 contains a LINE/L2 repeat:

>seq2TATAAATACAAATACGTATACATGTCTATTATAATGAAAAATTGCCAATCTTGTTTAAGCAAATGCATTCTATCGTTATTATAAATGTTAGTTCTAGCTTTATTTACTTCAAAATCTTAAATCAGAATAAATTAATATTGTATTGCTGCTGTGCGTGGAAAAAGATGATGTTTATGTTCTTATAGAATAAAAGCTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGAAT

Seq3 contains a LINE/L2 repeat:

>seq3TGGCACTCACCCATGGCAGCACAGGATGCCATCTTCTTTGAGTGCTGTCGTAATGAGCTGGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGAATAATAGGACTCCT

Seq4 contains a low complexity region:

>seq4NNNNNNNNNNNNNNNNNNNNNCATATCCCTACATCCCCTCCCTCCCACCACCCTATTCAATTGAGAGCAGGGAGATATTTTTTGGATTTCCTTTATACTGTGAAGTCACATGCATCGAAGGGTCAAACCTCTAGGTGCAGAAAAGGAAAAAAAAACCTATAAATACAAATCTGTATAGATGTCTATTATTATGAAAAATTCCCCATCTTGTTTAAGCAAATGCATTCTATCGTTATTATAAATGTTAGTTCTAGCTTTATTTACTTCCAAATCTTAAATCAGAATAAATTAATATTGTATTGCTGCTGTGCGTGGAAAAAGATGATGTTTATGTTCTTATAGAATAAAAGCTGTGGAACG

A clean little clustering




Exercise 2: Gene Indices

Unigene

Statistics

Keyword search

Libraries

DDD

TIGR Gene Indices

Statistics