The Entomology encompass the huge task of understanding almost 80% of the living species on Earth, of which perhaps 85% remain uncollected and not described, because the species radiation have been faster than in any other living group. However, insects are not only important for biodiversity or academic studies, in fact about one third of the crop production of the world depends directly or indirectly on pollination by insects. Not to mention the tremendous threat of many human and animal diseases due to their propagation by insect vectors. It is of interest that new insect specific protein domains could potentially lead to new drug targets.

Within the insect kingdom already two complete genomes are sequenced: Drosophila melanogaster (fruit fly) and Anopheles gambiae (mosquito) (Holt, Subramanian et al. 2002). Moreover several others are nearly complete or planned like Bombyx mori, Aedes aegypti, and Apis mellifera (Evans and Gundersen-Rindal 2003). The fruit fly and the malaria mosquito are highly successful dipteran species that diverged about 260 million years ago, (Gaunt and Miles 2002). They share a broadly similar body plan and a considerable number of other features, but they are also substantially different in terms of ecology, morphology, life style, and genome size. A prominent difference is the ability of Anopheles females to feed on the blood of specific hosts. Hematophagy is linked to specific host-seeking abilities as well as to nutritional challenges and requirements distinct from those of Drosophila (Zdobnov, von Mering et al. 2002). The comparison of the genomes and proteomes of these two dipters, reveals considerable similarities. Almost half of the genes in both genomes are interpreted as orthologs and show an average sequence identity of about 56% which is slightly lower than the observed between the orthologs of the pufferfish and human (Zdobnov, von Mering et al. 2002). This indicates that these two species diverged considerably faster than vertebrates, and this is coherent with the enormous diversity within the Insecta clade.

Altogether the availability of those genomes opens the way to identify insect protein domains that would be very useful to help to annotate, and to understand the biochemistry, ecological meaning and evolutionary history of these organisms. This work displays an applied technique to hunt for conserved insect specific domains by clustering the insect protein sequences that have not been annotated yet. To achieve this goal we used a home-made filtered database of insect proteins with the MKDOM2 tool, an automated PSI-BLAST system (Gouzy, Corpet et al. 1999).


Evans, J. D. and D. Gundersen-Rindal (2003). "Beenomes to Bombyx: future directions in applied insect genomics." Genome Biol 4(3): 107.
Gaunt, M. W. and M. A. Miles (2002). "An insect molecular clock dates the origin of the insects and accords with palaeontological and biogeographic landmarks." Mol Biol Evol 19(5): 748-61.
Gouzy, J., F. Corpet, et al. (1999). "Whole genome protein domain analysis using a new method for domain clustering." Comput Chem 23(3-4): 333-40.
Holt, R. A., G. M. Subramanian, et al. (2002). "The genome sequence of the malaria mosquito Anopheles gambiae." Science 298(5591): 129-49.
Zdobnov, E. M., C. von Mering, et al. (2002). "Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster." Science 298(5591): 149-59.