Practicals - Biomolecular databases

Contents


[back to contents]

Introduction

This is the practical session for the chapter Biomolecular databases of the course Introduction to bioinformatics. The slides of the lecture are available in various formats.

During the tutorial part of this practical session, we will define a few biological problems, and see how databases can be used to obtain the answer.

It is important to get the answer for the problemss solved in the tutorial, because the results of some database queries will be used as input for the subsequent practicals (sequence alignment, phylogeny).

A series of exercises will give you the opportunity to use the concepts seen in the tutorials to answer some concrete biological questions.

Students should of course feel free to add their own questions to this list, which can be treated afterwards, if there is some time left.


[back to contents]

Resources

This tutorial will be based on the following Web resources.

Acronym Type Description+URL
EMBL Nucleic sequences The EMBL Nucleic Sequence Database (EBI - UK)
http://www.ebi.ac.uk/embl/
Genbank Nucleic sequences Genbank (NCBI - USA)
http://www.ncbi.nlm.nih.gov/Genbank/
DDBJ Nucleic sequences DDBJ - DNA Data Bank of Japan
http://www.ddbj.nig.ac.jp/
UniProt Protein sequences UniProt - the Universal Protein Resource
http://www.uniprot.org/
PDB 3D structure of macromolecules PDB - The Protein Data Bank
http://www.rcsb.org/pdb/
EnsEMBL Genome browser EnsEMBL Genome Browser (Sanger Institute + EBI)
http://www.ensembl.org/
UCSC Genome browser UCSC Genome Browser (University California Santa Cruz - USA)
http://genome.ucsc.edu/
ECR Genome browser ECR Browser
http://ecrbrowser.dcode.org/
Integr8 Comparative genomics Integr8 - access to complete genomes and proteomes
http://www.ebi.ac.uk/integr8/
Prosite Protein domains Prosite - protein domains, families and functional sites
http://www.expasy.ch/prosite/
Pfam Protein domains PFAM - Protein families represented by multiple sequence alignments and hidden Markov models (HMMs) (Sanger Institute - UK)
http://pfam.sanger.ac.uk/
CATH Protein domains CATH - Protein Structure Classification
http://www.cathdb.info/
InterPro Protein domains InterPro (EBI - UK)
http://www.ebi.ac.uk/interpro/
GO Gene ontology Gene Ontology Database
http://www.geneontology.org/
Entrez Multi-database A collection of biomolecular databases maintained at the NCBI (USA), accessible via an interface called Entrez.
http://www.ncbi.nlm.nih.gov/Entrez/
SRS Data warehouse A collection of biomolecular databases maintained at the European Institute for Bioinformatics (EBI, UK), accessible via an interface called SRS
http://srs.ebi.ac.uk/

[back to contents]

A quick tour of selected databases

The number of biomolecular databases is growing so fast that it is impossible to give a balanced survey of all the existing resources. We selected here a few databases on the basis of various criteria (popularity, ease of access, ...) to illustrate the type of information that can be retrieved from them.

As a matter of exercise, we propose to browse some databases in order to grab information about one particular protein. Each student can do the same analysis with some protein of interest to him/her. If you are out of inspiration, you can for example run the exercise with the Drosophila protein Ubx.

Exercise

Choose a protein for which you have some prior knowledge (e.g. the protein Ubx from Drosophila melanogaster, and try to extract all the information relevant to this protein in the databases listed in the table of biomolecular databases above.

Next steps

In the exercise above, we saw that each database an provide us with a piece of information about some aspects of our protein of interest:

Note that this is just a very small sample of the information that can be obtained via the hundreds of biomolecular databases distributed around the world.

We will now consult two Web servers (NCBI Entrez and EBI SRS) that provide an integrated access to multiple databases, thereby facilitating the consultation of multiple aspects regarding a protein of interest.


[back to contents]

Retrieving information from the NCBI with Entrez

Entrez is a retrieval system for searching several linked databases stored at the NCBI (National Computational Bioinfology Institute of the United States).

Goal

During this tutorial, we will learn to use the interface of NCBI Entrez to retrieve a protein of interest. As will be seen, a simple formulation of the query generally returns too many hits, and the desired answer may be lost in hundreds or thousands of other records. We will see how to use advanced search options in order to refine the query.

An example of simple query

Logical operators

Imposing constraints on a specific field

Specifying constraints on multiple fields

Browsing a protein entry

Saving the protein sequence in FASTA format

Getting the query history


[back to contents]

Retrieving information from various databases with SRS

[back to contents]

Additonal exercises

[back to contents]

More info

[back to contents]


Jacques van Helden (van-helden.j@univmed.fr)