Introduction to Biological databases - Good luck in cyberspace!

The major molecular biology servers:
European Bioinformatics Institute (EBI)
National Center for Biotechnology Information (NCBI)
Japanese GenomeNet

1 Database hunting - [answers]

Try to find an alive database (with its corresponding ‘home’ server address and the date of the latest update) dealing with:

- nucleic acid sequences (in USA)
- microarray data
- mass spectrometry data
- protein protein interaction
- rat enamel 2D gel electrophoresis
- Bacterial 16RNA

2 Human erythropoietin (EPO) - [answers]

Find, if it exists, a human erythropoietin sequence in the following sequence databases
(Look at the server which provides the most up-to-date database)

- EMBL (find an entry with an annotated CDS)
- RefSeq (RNA and protein sequences)
- UniProtKB/TrEMBL
- UniProtKB/Swiss-Prot
- UniParc (use SRS at EBI or the UniParc direct query tool)

3 Nucleic acid sequences databases - [answers]

a. Find the RefSeq “Constructed” entry for human chromosome Y
(Trick: they have numbered the entries according to the chromosome number (i.e. for chromosome 1: NC_000001)
b. How many gaps left are there in the sequence of chromosome Y ?
c. Find the telomer sequence  (5’end)
d. Do the same, starting from EnsEMBL

4 Protein sequences databases - [answers]

4. 1 Starting with the ExPASy server:

a. Look for the amino acid sequence of human carbonic anhydrase 2.
b. Get the corresponding nucleic acid entries in EMBL and GenBank: try to find a nucleic acid sequence derived from genomic DNA sequencing and another one derived from cDNA sequencing.
c. Look for the chromosomal localisation of the human carbonic anhydrase 2 using OMIM or Gene Lynx
d. >From the UniProtKB/Swiss-Prot entry, look at the data available for the variant Pro-92 and in particular to its position in the 3D structure (Use the “Astex viewer”).
e. Is there a maize carbonic anhydrase? or a drosophila carbonic anhydrase, in which protein sequence database ?

4. 2 Starting with the NCBI server:

Look for the amino acid sequence of human carbonic anhydrase 2 using ENTREZ protein at the NCBI server.
Find the UniProtKB/Swiss-Prot entry and as above:
- Get the corresponding nucleic acid entries in EMBL and GenBank: find the nucleic acid sequence of a genomic DNA and that of a cDNA.
- Find the data available for the variant Pro-91.

5 Environmental sequences - [answers]

How many environmental sequences (DNA) are found in the acid nucleic databases (use SRS at the EBI) ?
Look at AB036433: where does the sequence come from ? How reliable is the CDS ?

6 Genomic databases (I) - [answers]

a. Look for the UniProtKB/Swiss-Prot entry of the yeast gene RPL36B
b. Follow the link to SGD (Saccharomyces Genome Database) and find the chromosomal location
c. Get the SGD entry of the 2nd gene in 5' on the same chromosomal strand
d. Follow the link to UniProtKB/Swiss-Prot
e. Find the subcellular localisation of the protein
f. Have a look at the domain structure in the different domain databases. In PRODOM, get the list of proteins with at least one common domain.

7 Genomic databases (II) - [answers]

a. Find the  IL-2R alpha gene in the OMIM database?  What is its chromosomal location ?
b. View the cytogenetic maps of the regions surrounding the gene loci
c. Are there known diseases associated with this gene?  What are the clinical synopses?
d. >From OMIM (IL-2R alpha), follow the cross-reference to Entrez Gene (ex Locus Link). Have a look to the RefSeq sequence).
e. Find the corresponding UniProtKB/Swiss-Prot entry

8 Protein domain / family databases - [answers]

a. How many different databases are used by InterPro ?
b. What are the more frequent domains found in Drosophila, compare with Homo sapiens ?
(Go to the integr8 site:
c. Do an InterPro scan with the protein sequence available (see annexe)
d. How many different domains does the protein contain ?
e. How many ankyrin repeats does the protein contain
f. How many different protein domain databases have a discriminator for the ankyrin domain ? Are they using pattern, profile or HMM ?
g. Have a look to the pattern/signature of Prosite describing the Aldehyde dehydrogenase domain. What is the % of ‘known missed hits’ ?

9 Metabolic / Enzyme databases - [answers]

a. Go to the Genome Net server in Japan
b. Find a database called KEGG
c. In KEGG, find the enzyme number EC
d. Have a look at BRENDA database from the KEGG entry.
e. Get from KEGG the ENZYME entry in ExPASy, then from ENZYME the UniProtKB/Swiss-Prot entry.

10 3D structure database - [answers]

a. Find the entry 1IWO at PDB.
b. Look at the complete coordinates of the entry (by clicking to Download/Display PDB file). Find the “DBREF” line to find the cross-reference to UniProtKB/Swiss-Prot.
c. Try to visualize the structure by using Quick PDB or Jmol.
d. Can you identify the transmembrane helices? How many are there? How are they annotated in the corresponding UniProtKB/Swiss-Prot entry ?
e. How many PDB entries are there for the lysozyme T4 ?

11 Gene ontology database - [answers]

a. Go to the Gene ontology consortium
b. Look for the graphical view of the insulin receptor (INSR)

12 Protein protein interaction databases - [answers]

Look at the 'interactome' of p53 (human) (compare the results (and the graphical view) from DIP, Intact and Bind)

13 Publications databases - [answers]

a. How many papers did Nature publish in 1995 ?
b. Find the publication dealing with Dolly death and find its DOI number.
c. Get the publication thank to this DOI number on the site
d. How many articles are dealing with Viagra ? What is the 'generic' name of the molecule ?

14 SRS tutorial - [answers]

If you have still some time :---) do the SRS tutorial at:
ht, srsuser.pdf