The problem here is to track down a full-length clone for a given gene. One which you can then order or retrieve from somewhere and do experiments with.
There are three primary stages to this process:
Most clones, if they are available at all, are still in their original library collections of tens or hundreds of thousand of clones. A small number have been physically re-arrayed in the construction of ’UniGene’ style sets. You may already have some of the re-arrayed clones, in which case it may make sense to prioritise using those ones where possible.
The Gurdon EST Database is at http://www.gurdon.cam.ac.uk/informatics/Xenopus.html. This tutorial is aimed at making use of this resource, for a more general approach see the companion tutorial by Nicolas Pollet, "Using these wonderful Xenopus genomic resources".
Finding Full-Length Clones
(i) identifying the right group of EST sequences
The best starting point for searching for expressed sequences for a given gene is the gene sequence itself. The database also contains several functions to search on gene name, or words in blast hit descriptions, but most annotation to date is automated and so can be potentially unreliable, or misleading.
If you are starting with a gene sequence from a species other than Xenopus you may get better results by starting with the protein sequence. If you are starting with a Xenopus gene sequence then the best starting point may be the mRNA sequence (cDNA), although you may want to try and use only the coding region, as repeat sequences in the UTRs can make results harder to interpret.
In this example we start from the protein sequence for the human Claudin-4 gene, retrieved from NCBI EntrezGene :
>gi|4502877|ref|NP_001296.1| claudin 4 [Homo sapiens] MASMGLQVMGIALAVLGWLAVMLCCALPMWRVTAFIGSNIVTSQTIWEGLWMNCVVQSTGQMQCKVYDSL LALPQDLQAARALVIISIIVAALGVLLSVVGGKCTNCLEDESAKAKTMIVAGVVFLLAGLMVIVPVSWTA HNIIQDFYNPLVASGQKREMGASLYVGWAASGLLLLGGGLLCCNCPPRTDKPYSAKYSAARSAAASNYV
Now go to the web site of the Gurdon EST Database (see above), enter through the front page, and then choose a clustering project. Unless you’re looking for Xenopus laevis clones, you’ll almost certainly want the top ’current’ project:
"CURRENT Xt6: all tropicalis ESTs as of Feb ’05 (approx. 1,000,000)"
Hit the [find] button on the right, which will take you directly to the BLAST search page, and allow you to blast your sequence against the current EST clustering data. Paste your sequence into the box, check (in this case) the "amino acid sequence" box, and hit the [blast] button. This is illustrated in the following screen shot.

After a few seconds you will see a screen something like the following (the menus, etc. have been cut off the picture), showing which EST clusters your sequence has matched in an alignment diagram. You would normally be interested in the best match, which will be the top one. Once you’ve ascertained that the cluster is likely to be the one you want, you can click on the left hand link in the alignment diagram to go to the cluster view page, where you will be able to evaluate the individual ESTs.

How do you tell if this is the right EST cluster? - containing the Xenopus tropicalis claudin4 gene? To help you decide, the tables below the alignment diagram show the top blast hits for each of the EST cluster consensus sequences for several different species. In this case we can see that the top human hit is Claudin-4, which strongly suggests that this is indeed the Xenopus tropicalis ortholog. We can see that the Claudin family is extensive and its members are quite similar.
Now click on the cluster link, and you will be taken to the cluster view, which will be something like the following screen shot (although initially the font size will be larger). This view shows the aligned EST sequences on the right, an indication of the protein sequences aligned to the consensus sequence above them, and a list of the clones names with their full-length status on the left. It also shows the stop and start codons in the frame of the blast hits on the sequences.

(ii) working out which of the represented clones are full-length
First you need to work out if the cluster contains the start of translation; this will be indicated by the "approximate FL confidence score = 94%" in the top left, the nearer to 100% the more likely. Also you should see either a clear open reading frame starting, or the protein alignments all ’pointing’ to the same ATG (see the following screen shot). In this example it is fairly clear that most of the upper EST sequences contain the ATG start of translation. It’s a good idea to assure yourself that the start of translation has been correctly identified - the data you are looking at is analysed automatically and the analysis does go wrong occasionally.

Now look at the left hand side of the cluster view (see the following screen shot). You don’t need to worry about most of the columns of data, for this purpose the important ones are column 6, the full-length status, and column 8, the clone name (the .p1kSP6, etc. following the name contains information about the EST sequencing). The top line here shows that there is already a full-length cDNA sequence (highlighted in green between the ORF and the consensus sequence for the cluster), in this case one of the Gurdon/Sanger clones. Its status is FL (sequenced full-length), and its name is ’TNeu026p19’. Below that are all the EST sequences, where a green box and a code beginning ’5..’ show full-length status for the 5’ EST sequences of clones; clones without this are almost certainly not full-length. The darker the green the better choice the clone would be. The codes have the following meanings:
The full-length status determined from EST sequences in this analysis is not guaranteed, as there may still be an unsequenced region in the clone between the two EST sequences, containing frame-shifts or other oddities that we cannot yet know about.

If you already have the Gurdon/Sanger full-length clone set in your possession, you can choose to blast against only these full-length sequences directly, by clicking the relevant radio button (see following screen shot). In this case you simply get a list of the known clone sequences in the alignment window, from which you can click through to the cluster view as before, to verify that the correct choice has been made. We note in this case that the top entry is the same clone as we saw in the previous cluster view as being a full-length cDNA.

(iii) finding out which clones may be actually ordered
If you already have the re-arrayed set of full-length clones, this may not be a problem, although many genes are still not represented in this set. Otherwise the availablity of the clones (or not) may well affect your choice.
There are three large groups of clones:
At this point you should also be aware of some of the unavoidable limitations of the clone distribution process. This involves much physical manipulation, regrowing, etc. during which a small number of clones will effectively lose their original nature [1]]. So if you have the option, it may well be worth ordering two clones at the same time, should either one prove not to contain the gene sequence you identified. In any case you should probably always end-sequence the clone you get to verify that it contains the correct gene sequence.
Tutorial2.dpc, Word, 338.5 kbHappy hunting!
Mike Gilchrist July 2006
[1] To have a quantitative assessment you can look at [IMAGE consortium quality control page->http://image.llnl.gov/image/qc/html/QCoverall.shtml
2003-2010 © Metamorphosys - Tous droits réservés
copyright spip SPIP 1.9.2b [9381] copyright eva version eva
Dernière mise à jour : lundi 3 mai 2010