Metamorphosys

Tutorial : Managing gene based image file names for the Xenopus Image Search Engine.

Vous êtes ici : Accueil du site > X-OMICS > Tutorials > Tutorial : Managing gene based image file names for the Xenopus Image (...)
  • Publié : 25 janvier 2008
This tutorial gives you some advice and ideas on how to name image files for easy inclusion in the Xenopus Image Search engine. This is no guarantee that we will be able to, or will use your images, but it will make the process quicker and easier for everyone. Adoption of a good naming convention at the start of a project will save much pain later on.

Tutorial : Managing gene based image file names for the Xenopus Image Search Engine.


There are three things to take care of :

    • image file names
    • sequence IDs and other auxilliary data
    • location of image files

The two most important things are absolute consistency in naming image files, and knowing the actual sequences which were used to generate each image.

The closer you come to perfection the easier it will be for me to integrate your images into our distributed database.

And don’t be put off even if you only have 20 or 30 images - it’s all useful data.

1. Image file names

In general your image file names should be composed of two parts : <source-sequence>_<image-specific-data>.gif (or whatever)

The first part (<source-sequence>) should refer exactly to the sequence you used to generate the image. Typically this will be a clone name or a cDNA accession number, but it may also be (say) a 5’ EST accession number, or it could be your own made-up ID, especially if you are using un-submitted sequences.

The second part (<image-specific-data>) is the bit that tells us what the image is of. It typically contains a sequence of short codes, which again, must be used consistently. These codes may include the following :

  • species (only if mixed)
  • experiment type (must have this if your images are mixed types)
  • stage (must have this one)
  • view
  • treatment

But may have other elements that you feel would be useful to communicate to a viewer.

The parts of the name should be separated by ’-’ or ’_’ - please don’t use blanks or ’.’, or other difficult characters like =, +, |, >, etc..

The parts of the name should always be in the same order for your whole set of data, and all should used parts should be present for all genes, even if ’empty’ or ’NA’, for some.

E.g.

  • TGas012p12_exp-10-lat.gif (clone TGas012p12, in situ expression, stage 10, lateral view)
  • TGas012p12_exp-10-ven.gif
  • TGas012p12_exp-30-lat.gif
  • TGas012p12_exp-30-ven.gif OR
  • CR1987654-MO1-GAS.jpg (sequence CR1987654, morpholino 1, Gastrula)
  • CR1987654-MO1-NEU.jpg

If this is done in such a way that I can retrieve the sequences from these names, and understand your codes, then that’s about all it needs.

If an image file is named inconsistently it may not be searchable, although this will probably become clear in the error checking phase.

Now you will need to generate a list of all your image files - but probably you have this already. There are ways of generating a list automatically (-ish) if the images are all in one folder.

2. Sequence IDs and other auxilliary data

In addition you may need to construct one or two simple spreadsheet data files (that you send to me). Although these may not be necessary in all cases.

- List 1. image to sequence reference table

If you have used an indirect reference to a sequence in the first part of your image name, then it will need to be in this data, with each image reference (the part of your image name) accompanied by either the sequence accession number, or the sequence itself. This will apply also if you are using your own clone names.

So either :

  • NP-AX-100024 CR1987654.2
  • NP-AX-100025 CR2455448.1

(your images were named NP-AX-100024-etc...)

OR

  • OB-000234 AGCTCGATAGATTAGGATTAGGATCTCTACGATCG...
  • OB-000235 GCTTCTAGATTTATTATATTATTTCTCTCTCCCTCTCT...

(your images were named OB-000234-etc...)

Or you may provide a .fasta file [1] of your sequences.

N.B. If you are using your own sequences, say amplified from genomic or mRNA material, then much the best strategy is to submit these sequences to GenBank before (finally) naming the images, or generating the image data. The process is quite quick, and has the advantages that (a) you will have an accession number when you come to publish the data, and (b) it will be easier to incorporate the image in the Search Engine.

- List 2. short code translations

This is very simple, and just enables me to get your images described correctly from your embedded codes.

E.g.

    • GAS Gastrula stage
    • NEU Neurula stage
    • TBD Tailbud stage
    • dor dorsal view
    • lat lateral view
    • MO1 first morpholino
    • MO2 second morpholino

etc.

As many as you need - preferably in groups.

ALTERNATIVELY (tho’ not recommended)

You can provide a complete spreadsheet list of your image file names with a description for display and either the accession number to retrieve each sequence or the sequences themselves, e.g.

  • AM-first-image.gif ’in situ, stage 20, lateral view’ CS7776765.1
  • AM-second-image.gif ’in situ, stage 30, dorsal view’ CS7776765.1
  • CD-another-one.gif ’tailbud stage, morphant phenotype’ DR_453298.1

- 

3. location of image files

There are two options here :

- put them in a folder on a local web-server (or any other that you can get access to !)
- send them to me

I prefer the first option, and most of you will have some sort of Institute web server which should be happy to host your image data in a publically accessible folder (that’s the optimistic view). This should be very easy with a bit of co-operation from your IT folk. If that’s not possible, just send them to me on a CD.

Notes

[1] Fasta format looks like


>seq def
ATCGAG
GATTGG