One of my colleague asked me about the [your favourite gene name] gene in Xenopus. He wanted a cDNA clone and sequences required for molecular and evolutionary analysis.
Here is what I did to look for sequences and clones.
One of my colleague asked me about the aromatase gene in Xenopus. He wanted cDNA clones and sequences required for molecular and evolutionary analysis.
There is more than one way to do it. In this case, I looked first on the NCBI web site in the Unigene database.
I got the following page when querying for aromatase:
There is a total of 26 entries found. The one numbered 1 on the list has the identifier Xl.948. This tells me that it is a Xenopus laevis derived entry. On the right, two blue links are available: the one named NIH cDNA clone enables the identification of potential full-length cDNA clones.
Let us left click on it and we have:
If I want this clone, I can issue an order to one of the IMAGE consortium distributor, using the identifier IMAGE:6944814. In Europe, we have the german RZPD and the british MRC geneservice. Let us use the blue link located on the right and named « Links » : Left click on it to have a contextual menu appearing.
Let us choose the Protein option to have the protein sequence:
Left click on the protein Id, BC079750, and choose Fasta in the menu called Display.
Here we are, we identified a cDNA clone and we have a protein sequence corresponding to the gene we are interested in.
So now we would like to know if there is a Xenopus tropicalis gene sequence available for that same gene.
If we go back to our initial search, we got a list of 26 entries for aromatase, but none corresponds to a Xenopus tropicalis entry (initials are Str). Let us copy the sequence of the X. laevis aromatase A protein in fasta format.
Now, we will go to the jgi blast server for X. tropicalis draft genome sequence.
Go at the URL: http://genome.jgi-psf.org/Xentr4/Xentr4.home.html, and select the BLAST menu. Choose tblastn as alignment program option, and paste the protein sequence. You should have such a window:
Click on « Submit job » button, and after a couple of seconds you have such a result window appearing:
It seems that the aromatase sequence in its entirety can be aligned on the scaffold_203: the green, pink and red arrows are marking local alignments, certainly corresponding to exons. The distinct colors are reflecting the differences of alignment scores. Let us click on « scaffold_203 » to look at these results in the genomic context.
A window similar to the following one should appear:
In this window, if you concentrate on the lower half reproduced here, you are looking at a graphical representation of the results obtained by running several algorithms or similarity searches on the genomic sequence.
From top to bottom you have a set of graphical representations called « tracks ». They represent the result of a bioinformatic analysis of the genomic sequence using either gene prediction tools (such as genewise, fgenesh etc ) or sequence similarity tools (blast against protein databases, etc ).
Each of these track can be displayed in different modes : visible, hidden, collapsed or expanded. To switch between visible and hidden, you need to activate a toolbar by clicking on « Open/Close toolbar ».
A toolbar should appear on the left side of your browser window. Look for Advanced Track Controls. Radio buttons enables the selection between hide, dense and full views of each track available. In particular you might be interested to activate the dense view related to the Xenopus EST tracks.
If a given track is not available, it means that no results were obtained with this king of analysis. In the chosen case, remark that there is no all_XtEST track.
Further down we have the scale of the representation, showing where we are on the scaffold.
Then a representation of the blast results we just made is drawn in yellow. Rectangles are showing regions implicated in the alignments reported by blast. They are all joined by a chevron line, this represents the fact that these alignments involve DNA sequences that are colinear on a larger sequence (our query in this case). The orientation of the chevrons represents the strands orientation : same or reverse-complementary. Here it is the same orientation, thus the coding strand in the genome is the one being displayed.
A representation of the conservation between human and frog sequences computed using the VISTA tool.
A representation of the genomic sequence as a thick black line. Red regions indicate gaps in the sequence. Click on the black line if you want the whole scaffold sequence of part of it.
A set of representations for various bioinformatic analyses. Remark the track entitled Filtered Models Version 1. This track shows what is considered as the best model available for a given gene, based on a consensus from a variety of analysis.
If you click on the red words « fgenesh1_pg_.C_scaf... » you should have the following page with informations concerning the protein made out of this gene model :
For the time being, we will just concentrate on fetching the genomic sequence for that gene, the transcript sequence and the protein sequence. To get the nucleic sequences, click on the first line made of a thin black line (introns) with red rectangles (coding exons) and some blue rectangles (noncoding exons).
The first sequence you see is the gene sequence, with exons highlighted in red. You can modify the number of additional base pairs on the 5’ and 3’ ends of the gene by changing the value named « upstream/downstream padding ». This is useful to get promoter sequence.
If you scroll down the page, you can access the transcript sequence.
Now to fetch the protein sequence, go back one page and click on the thick green rectangle.
So far, so good ! Now, coming back to our question, we would like to have a cDNA clone corresponding to this gene. If we look carefully in the genome browser window, there is a single track related to X. tropicalis EST showing some rectangles, expanding this track reveals four clusters aligned to this scaffold.
In two instances, a thin line goes throughout the window: this is telling us that these two clusters are aligned somewhere else in the scaffold, pointing to a repetitive tract of some kind. We notice that one cluster (the rectangle in the middle) aligns beyond an exon boundaries. The last cluster is shown as a rectangle on the right, with a 5’ boundary in agreement with the gene model, and with alignments on X. laevis cDNAs. This is probably the 3’ terminal containing untranslated region since protein alignments do not cover the 3’ end.
If you click on the rectangle you have access to the details of the alignment, and you can check that the two sequences are identical. This sounds promising, this EST tells us that the corresponding cDNA clone can be used as a reagent for the X. tropicalis aromatase gene.
Well, the problem is that I just have the identifier of the cluster, that is the number 804646. I can not order a clone with just this identifier. I need to know if this clone is from the IMAGE collection or not and if I can get it.
To answer these questions, I just need the gene model transcript sequence to start a sequence similarity search among the X. tropicalis ESTs.
Let us do this. We just have to copy the cDNA sequence (see before) and visit the NCBI blast page, select the « Nucleotide/Nucleotide BLAST (blastn) ».
Here I paste my sequence in the search field, choose est_others as database, and select Xenopus tropicalis in the menu located in the options.
I usually uncheck the low complexity filter in this case. Now, click BLAST and FORMAT and wait ...
In case the NCBI server is too crowded, you can use the advanced BLAST server of the swiss EMBNET node. Select X. tropicalis est and htc as databases, quicker than NCBI.
And the result is that three ESTs are clearly corresponding to our gene :
Looking at the alignments, the top two hits are relevant, the third one is strange since the alignment encompass just a segment of the EST, and this segment is right in the middle of the EST. We are left with two ESTs, looking at the definition line we have :
>emb|CX885719|CX885719 [Xenopus tropicalis]JGI_CAAL23580.fwd NIH_XGC_tropBrn4 Xenopus tropicalis cDNA clone IMAGE:7793729 5’, mRNA sequence. >emb|CX885718|CX885718 [Xenopus tropicalis]JGI_CAAL23580.rev NIH_XGC_tropBrn4 Xenopus tropicalis cDNA clone IMAGE:7793729 3’, mRNA sequence. It corresponds to the 5’ and 3’ ends of the same cDNA clone, IMAGE number 7793729.
We can order this clone!
First just to say that the X. tropicalis draft genome sequence is available for download and thus we can browse it using the JGI browser but as well on the ENSEMBL and the UCSC browsers.
Each browser has its advantages and inconvenients, and gene models can be different from one to the other (especially between JGI and ENSEMBL).
For now on, let us use the JGI browser.
If we go back to the JGI browser, let us click on the GO (pronounce geo, more details at www.geneontology.org.
We access a controlled vocabulary used to describe biochemical activities of gene product (molecular function), their biological roles (biological process) and their cellular localisation (cellular component). Let us inspect what is available concerning the developmental roles of Xenopus genes:
We observe that only 393 genes are labelled as being involved in a developmental process. Probably an underestimation, since we notice that not a single gene is indicated as playing a role in metamorphosis.
Let us click on the number 8 right to the term embryonic development.
Well, the least we can say is that many more genes should be in this list. This is one of the purpose of annotation.
Happy data mining !
Next tutorial is scheduled on mining all these Xenopus ESTs and cDNA sequences.
2003-2010 © Metamorphosys - Tous droits réservés
copyright spip SPIP 1.9.2b [9381] copyright eva version eva
Dernière mise à jour : lundi 3 mai 2010