YODA

General

YODA (Yet-another Oligonucleotide Design Application) is a tool for designing specific oligonucleotide probes for DNA sequences for use in microarrays, or other applications requiring signature oligonucleotides.


While many options are available, using YODA can be very easy for certain applications. All parameters have "sensible" default settings. As an example, if you wish to design a single 60mer oligo for each gene in yeast, a nd you have a single file containing the sequences of all yeast genes, simply load this file as a Design file and press "Run". You will be prompted for a name for the results file, then the design task will begin. Now all you have to do is wait for the p rogram to finish and you will be presented with a table of selected oligos.

Panels

Design

The "Design" panel allows you to load one or more DNA sequence files containing sequences for which oligos are to be designed. All accepted oligos will have been screened for ability to cross-hybridize to all sequences in the Design files.
At least one Design file is required.
All sequence files must be in FASTA format. Sequence characters should be A, T, G or C (no N or gaps). A single file can contain multiple sequences. The files must be plain text, with no special formatting (eg. not Word files).

Genome

The "Genome" panel allows you to load zero or more DNA sequence files containing sequences including the sequences in the Design files. The sequences in the Design files must be a subset of the sequences in the Genome files if any Genome files are specified. For example, if you are designing oligos for chromosome 1 of Ar abidopsis, you can give chromosome 1 sequences as a Design file and give all Arabidopsis sequences as a Genome file.
Genome files are optional.
All sequence files must be in FASTA format. Sequence characters should be A, T, G or C (no N or gaps). A single file can contain multiple sequences. The files must be plain text, with no special formatting (eg. not Word files).

Host

The "Host" panel allows you to load zero or more DNA sequence files containing sequences for which oligos will not be designed. All selected oligos will have been screened for ability to cross-hybridize to all sequence in the Host files. For example, to design oligos for a Human pathogen, give the pathogen sequences as a Design file and give Human sequences as a Host file.
The difference between Host files and Genome files is that Design file sequences are found in Genome files and are not found in Host files.
Host Files are optional
All sequence files must be in FASTA format. Sequence characters should be A, T, G or C (no N or gaps). A single file can contain multiple sequences. The files must be plain text, with no special formatting (eg. not Word files).

Sort

The "Sort" panel allows you to choose zero or more "Probe Sorters". The role of a Probe Sorter is to select the "preferred" oligo(s) from all of the candidates for a sequence which have passed the various filters (Tm range, GC content, etc.). Different Probe Sorters have different criteria for choosing "preferred" oligos. The default Probe Sorter is the "Coverage Probe Sorter", which seeks to find oligos evenly spaced throughout the sequence. An oligo must meet all stringency requirements before being selected by a Probe Sorter.

Parameters

The "Parameters" panel allows you to view and alter the various parameters used in the oligo design task. For more detail about the individual parameters and their meanings, please see the "Parameter Details" section below.

Status

The "Status" panel displays information about the progress of the design task, including number of sequences for which oligos where successfully choosen, number of sequences for which oligos were not chosen, and total run time for the design task.

Parameter Details

Oligo Length

This is the length, in nucleotides, of the oligos to be selected.

Max % Identity

This is the maximum percent identity allowed between a selected oligo and any other sequence being considered. It is important to note the impact of changing this value on total run time of the program. If the Max Percent Identity is set below 80% the program will take signicicantly longer to run (about three times as long). This is due to a heuristic used to find matches at 80% and greater identity.

Tm Range

This is the range, in degrees C, of acceptable melting temperatures. The range is centered on the calculated mean melting temperature of all oligos in the design files.For example, if oligo length is 60 and Tm range is 6, YODA calculates the mean Tm for all 60mers in the sequences in the design files. If this mean Tm is 75 deg. C, then the range of acceptable Tms will be 72 - 75 deg. C.

%GC Range

This is similar to Tm Range, but with GC content.

Max Consecutive Matches

In addition to overall percent identity between selected oligos and other sequences, YODA limits the length of stretches of exact matches between the selected oligo and any other sequence. This value sets that limit. If Max Consecutive Matches is set to 15, then every stretch of 16 nucleotides in any selected oligo is unique within all sequences being considered. The acceptable range is from 14 to Oligo Length.

Dimer Window/Stringency

Oligos are checked for ability to homo-dimerize. This is done by checking for checking for a number of base-pairing nucleotides within a certain window size.For example, with a Dimer Window of 15 and a Dimer Stringency of 13, the oligo is checked for any possible alignment with itself giving a stretch of 15 bases where 13 or more of the bases form Watson-Crick pairs. That is a stretch of 15 bases with 2 or fewer mismatch pairs.

Hairpin Window/Stringency/Min Gap/Max Gap

Oligos are checked for ability to form hairpin, or stem-loop, structures. Potential hairpin structures are indicated by a region of the oligo which is able to base-pair with another resion of the same oligo. As with dimerization, there is a window length and a stringency value used to determine this. In addition, the two pairing regions must be separated by several bases (the "loop"). The allowed lengths of the loop are determined by Hairpin Min Gap and Hairpin Max Gap.

Max Poly X

This is the maximum number of consecutive occurance of the same base.

Prohibited Sequences

These are sequences to be avoided in selected oligos. Enter one sequence per line. Use only A, T, G, and C. Sequences should be no longer than 15 bases. One possible use for this is to avoid the presence of a particular restriction site in your oligos.

Elapsed Time

At the top right corner there is a timer. This is to provide an idea of how long a task has been running. The timer can also serve to reassure you that the program is still running and hasn't frozen.

Output

Each Probe Sorter produces its own output file, with a unique file extension. These are plain text files. On Windows systems these files can be opened with WordPad. On Mac OSX these files can be opened with TextEdit.
The data in the output files consists of one line for each oligo selected. The columns for the oligo information are: FASTA title line, oligonucleotide sequence, distance of oligo start position from the 5' end, distance of oligo start position from the 3' end (denoted with a "-" to indicate that this distance is counted back from the 3' end), percent G+C of the oligo, melting temperature of the oligo.
The output files are in tab-delimited format, so they can be read by many spreadsheet programs. You may need to rename the file so it ends with ".csv" or ".xls" in order to open with a spreadsheet program.

In addition to the oligo files, there are two files for each probe sorter. One has the extension ".no.titles" and contains the title lines from each sequence for which the sorter was unable to select oligos. The other file has the extension ".no.sequences" and contains the complete sequences corresponding to these title lines. These files may be useful for attempting to design oligos with relaxed stringencies for these sequences.

Oligo Viewer

YODA includes a simple table viewer for diplaying the results. By default, a table will open to display the results from each Probe Sorter that was used. The viewer currently has very limited data manipulation capabilities.

Credits

YODA was developed by Eric Nordberg at The Virginia Bioinformatics Institute.
Questions/Comments/Suggestions are welcome.
enordber@vbi.vt.edu


Last modified on 16Feb2008