Databases on the Internet - hints & answers


1-1a Go to SWISS-PROT, use full text search with carbonic anhydrase and pick human anhydrase 2 (CAH2_HUMAN).
1-1b-d follow the links
1-1e Yes for Drosophila, Yes for Maize (3 in TrEMBL)

1-2a Go to SWISS-PROT, use full text search with RPL36B and pick this entry (R36B_YEAST)
1-2b-c the gene is GAL4
1-2d-e follow the links, localisation is nuclear, GAL4_YEAST
1-2f follow the links

1-3a CFTR mutation database

1-4a-b FlyBase, TSH_DROME

1-5a Genome Net
1-5b KEGG
1-5c EC
1-5d follow the links OXO1_HORVU, GERMIN


This exercise illustrates some of the difficulties with searching databases. There is no accepted, standard nomenclature for genes and their products. Therefore, all imaginable alternate spellings should be tried, and combined with logical operators. The following syntax finds many IL-2 receptor sequences: (IL2 | IL-2 | 'interleukin 2' | interleukin-2) & recept*  However, some receptor entries do not contain the word 'receptor'... So it is safer to use IL2 | IL-2 | 'interleukin 2' | interleukin-2 and go through the output by hand.

You will also notice that many EMBL entries do not contain links to Swiss-Prot (because there is no associated CDS: most of the entries are ESTs which have not been translated). Therefore, the only way to retrieve the corresponding protein entry is to search Swiss-Prot using the EMBL accession number (the links between SWISS-PROT/TrEMBL and EMBL are bidirectional and up-to date (should be done automatically at EMBL/EBI); this is not true for GenBank).

The rest of the exercise can be completed entirely by following links from one Web page to another. There is no single "right" way to do this. In fact, you are encouraged to try alternative routes to the same information.