BLASTP 2.2.1 [Jul-12-2001] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= sp|Q04323|Y33K_HUMAN Hypothetical 33.4 kDa protein (298 letters) Database: swiss 102,164 sequences; 37,554,368 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value sp|Q04323|Y33K_HUMAN Hypothetical 33.4 kDa protein.[Homo sapiens] 387 e-108 sp|Q9VCE9|UAS3_DROME (CG13604)UBASH3A protein homolog.[Drosophil... 64 3e-10 sp|P56399|UBP5_MOUSE (USP5..)Ubiquitin carboxyl-terminal hydrola... 60 6e-09 sp|P45974|UBP5_HUMAN (USP5..)Ubiquitin carboxyl-terminal hydrola... 60 6e-09 sp|P38237|UBPE_YEAST (UBP14..)Ubiquitin carboxyl-terminal hydrol... 57 5e-08 sp|Q92995|UBPD_HUMAN (USP13..)Ubiquitin carboxyl-terminal hydrol... 51 2e-06 sp|P54201|UBPA_DICDI (UBPA)Ubiquitin carboxyl-terminal hydrolase... 51 2e-06 sp|P34631|YOJ8_CAEEL (ZK353.8)Hypothetical 51.6 kDa protein ZK35... 50 3e-06 sp|P57075|UAS3_HUMAN (UBASH3A)UBASH3A protein.[Homo sapiens] 49 7e-06 sp|P38349|YB9R_YEAST (YBR273C..)Hypothetical 50.0 kDa protein in... 47 3e-05 sp|P47049|YJE8_YEAST (YJL048C..)Hypothetical 45.0 kDa protein in... 40 0.005 sp|P54731|FAF1_MOUSE (FAF1)FAF1 protein (FAS-associated factor 1... 38 0.022 sp|O76387|Y248_CAEEL (C24G6.8)Hypothetical 33.7 kDa protein C24G... 36 0.065 sp|Q10483|YDFB_SCHPO (SPAC17C9.11C)Hypothetical 27.6 kDa protein... 31 2.1 sp|P10688|PID1_RAT (PLCD1)1-phosphatidylinositol-4,5-bisphosphat... 30 3.6 sp|Q02257|PLAK_MOUSE (JUP)Junction plakoglobin (Desmoplakin III)... 30 4.7 sp|P14923|PLAK_HUMAN (JUP..)Junction plakoglobin (Desmoplakin II... 30 4.7 sp|Q9ZEU3|EFTU_APPPP (TUF)Elongation factor Tu (EF-Tu).[Apple pr... 30 4.7 sp|Q9C291|MR11_NEUCR (MUS-23..)Double-strand break repair protei... 30 6.1
- What are the positive matches? They are shown in green
- Do you detect a potential domain? Yes in the first 60 residues of the query
- What happens if you remove the filter (uncheck BLAST filter button)? The results are polluted by parasite sequences matching in a glutamic acid rich region
2) Select the matching sequences and create a multiple fasta format file by cut&paste
>y33k_human MAELTALESLIEMGFPRGRAEKALALTGNQGIEAAMDWLMEHEDDPDVDEPLETPLGHIL >uas3_drome LTPLQTLLQMGFPRHRAEKALASTGNRGVQIASDWLLAHVNDGTLDE >ubp5_mouse1 MLDESVIIQLVEMGFPMDACRKAVYYTGNSGAEAAMNWVMSHMDDPDFANPLILP >ubp5_mouse2 TIVSMGFSRDQALKALRATNNSLERAVDWIFSHIDDLDAEAAMDISEG >ubp5_human1 MLDESVIIQLVEMGFPMDACRKAVYYTGNSGAEAAMNWVMSHMDDPDFANPLILP >ubp5_human2 TIVSMGFSRDQALKALRATNNSLERAVDWIFSHIDDLDAEAAMDISEG >ubpe_yeast SISQLIEMGFTQNASVRALFNTGNQDAESAMNWLFQHMDDPDLNDPFVPP >ubpd_human1 SSVMQLAEMGFPLEACRKAVYFTGNMGAEVAFNWIIVHMEEPDFAEPLTMP >ubpd_human2 ITSMGFQRNQAIQALRATNNN-LERALDWIFSHPEFEEDSD >ubpa_dicdi1 LDTLLSMDFPLVRCKKALLATGGKDAELAMNWIFEHTEDPDID >ubpa_dicdi2 VDNIIGMGFTDSQAKLALKNTKGNLERAADWLFSHIDNLD >uas3_human LEPLLAMGFPVHTALKALAATGRKTAEEALAWLHDHCNDPSLDDPI
then perform a multiple alignment using ClustalW (or Emma on command-line)
CLUSTAL W (1.74) multiple sequence alignment y33k_human MAELTALESLIEMGFPRGRAEKALALTGNQGIEAAMDWLMEHEDDPDVDEPLETPLGHIL uas3_drome ---LTPLQTLLQMGFPRHRAEKALASTGNRGVQIASDWLLAHVNDGTLDE---------- ubp5_mouse1 MLDESVIIQLVEMGFPMDACRKAVYYTGNSGAEAAMNWVMSHMDDPDFANPLILP----- ubp5_mouse2 --------TIVSMGFSRDQALKALRATNNS-LERAVDWIFSHIDDLDAEAAMDISEG--- ubp5_human1 MLDESVIIQLVEMGFPMDACRKAVYYTGNSGAEAAMNWVMSHMDDPDFANPLILP----- ubp5_human2 --------TIVSMGFSRDQALKALRATNNS-LERAVDWIFSHIDDLDAEAAMDISEG--- ubpe_yeast -----SISQLIEMGFTQNASVRALFNTGNQDAESAMNWLFQHMDDPDLNDPFVPP----- ubpd_human1 ----SSVMQLAEMGFPLEACRKAVYFTGNMGAEVAFNWIIVHMEEPDFAEPLTMP----- ubpd_human2 ---------ITSMGFQRNQAIQALRATNNN-LERALDWIFSHP---EFEEDSD------- ubpa_dicdi1 ------LDTLLSMDFPLVRCKKALLATGGKDAELAMNWIFEHTEDPDID----------- ubpa_dicdi2 ------VDNIIGMGFTDSQAKLALKNTKGN-LERAADWLFSHIDNLD------------- uas3_human ------LEPLLAMGFPVHTALKALAATGRKTAEEALAWLHDHCNDPSLDDPI-------- : *.* . *: * : * *: *
- How many conserved residues do you see (count the stars)? 7 conserved residues
3) Create a simple PROSITE pattern and search (or here or fuzzpro on command-line) against Swissprot
M-[GD]-F-x(4)-[SAC]-x(2)-A-[LV]-x(2)-T-x(4,5)-[EQ]-x-A-x(2)-W-[LIV]-x(2)-H
- Do you detect more proteins? No
- Why? Because of the stringence of the pattern, one should start with a relaxed pattern and increase the stringency little by little to remove false positives
4) Do a PSI-BLAST (or here) against SwissProt with the first 60 residues of Y33K_HUMAN
- Iterate 3-4 times by selecting potential matches
- Do you detect more proteins? Yes
- Why? Because the PSI-BLAST is more flexible in allowing mismatches and amino acid substitutions
- What is the conserved domain? It is a Ubiquitin Associated domain (UBA) found in many proteins involved in the ubiquitin pathway. It is known to bind ubiquitin.