Answers for EST clustering
Clustering
- The dataset produces 5 clusters.
- Cluster1, cluster2, and cluster3 are composed by a number of squences.
- Cluster4 and cluster5 are singletons. They don't have enough homologies to other sequences to be clustered.
Assembly
- Both Phrap and CAP produce three contigs.
- To compare the results you can use cross_match:
- cross_match -alignments cap3_est_out/cluster1.fasta.cap.contigs phrap_est_out/cluster1.fasta.contigs > contig_align
- A good way to compare the results is also by looking at the log files. In these files you can also find which sequences have been used for which contig.
- The programs create also a file of singlets (sequences belonging to a cluster, but not to a contig)
What is my sequence?
- The sequence corresponding to the contig produced by cluster1 is CD4.
- The sequence corresponding to the contig produced by cluster2 is p53.