Gene finding and structure prediction
The working sequence
Click here to get a vertebrate contig (20 kb) that will be used for the exercise. For the moment you don't know much about it, but hopefully you will be able to say more at the end of the exercise ;-)
Save the sequence in your local computer (ready to be uploaded to the prediction programs or copy/paste).
The goal of the exercise is to predict all complete genes contained in the sequence using gene prediction programs, EST searches, and species comparisons.
Rules of the game:
Proceed in four steps, using increasing amounts of information not necessarily available for all genes:
- Phase 1: Predict coding regions and gene structures. Use only gene prediction programs and WWW servers that do not use sequence homology information. Pick up two or three predictors from the list and try them. Compare the results. Pay attention also on the different services each web server provide (mark repeats, ...).
- GRAIL Oak Ridge Nat. Laboratory (US)
- MZEF Cold Spring Harbor Labs (US)
- HMMgene Center for Biological Sequence Analysis (Denmark)
- GENSCAN MIT (US)
- GENEMARK Georgia Institute of Technology (US)
- Genie Lawrence Berkely Nat. Laboratory (US)
- Geneid Genome Informatics Research Lab (Spain)
- GeneBuilder Instituto Tecnologie Biomediche Avanzate (Italy)
- Phase 2: Extract predicted coding region and/or protein from the original sequence using the tools available (see following list) or use directly the predicted genes/protein from the programs output. Blast the predicted genes/proteins to find homologous to confirm gene structure (ESTs,proteins,cDNAs). Do a closer comparison with the first gene predicted by Geneid and Genscan: what can you say?
- Phase 3: Homologous can be used to build improved gene structure. You can analyze fragments of the sequence to avoid too long waiting time.
- Wise2 Build gene structure using a protein or HMM-profile as template. Maximum DNA size 6kb in interactive mode.
- Phase 4: Compare your results to the annotation available at NCBI or EBI. Search your sequence using the BLAST services proposed by the different organizations. Explore the results.
- Ensembl Compare your predicted genes to the human and mouse ones in Ensembl database.
- NCBI Compare your predicted genes to the human and mouse genomes at NCBI.
- NCBI-BLAST Compare your predicted genes to human and mouse ESTs.
Questions:
- How accurate are gene prediction algorithms ?
- Which gene prediction tool performed the best on your sequence ?
- Which gene prediction tool can deal with multiple genes in one sequence ?
- How useful are EST/protein searches for gene prediction ?
- How useful are cross-species comparisons of genomic sequences for gene prediction ?