Oligod design manual

Introduction

The OligoDesign system solves the problem of designing optimal LNA modified oligonucleotides for detection of expressed genes, for use in expression micro arrays. The system takes the nucleotide sequence of the target gene as input, and calculates a prioritized list of optimal oligonucleotides.

The OligoDesign system features LNA modified oligonucleotide secondary structure prediction, LNA spiked oligonucleotide melting temperature prediction, genome wide cross hybridization prediction, secondary structure prediction of the target and recognition and filtering of the target in the genome. These features are determined for each possible probe of the query gene and presented to an artificial neural network. The probes are hereafter ranked according to the neural network prediction and the top scoring probes are returned.

Step by step guide to the succesfull design of an oligonucleotide

 Enter the target gene sequence.

One or more sequences in fasta format The sequence of the target gene
Enter the nucleotide sequence of the target gene, space and numbers are ignored. The nucleotides that can be used are : acgt. The maximal sequence length is 10000nt.

Optionally a sequence name may be specified by providing the sequence in fasta format e.g.:

>Z81127 C. elegans example
aaagtgactaacggaggatctcgccaattatctttgagagacaaaactgaaactccttat
ttaaatgcaacaattgcggaagtacaacgacatgcatccatcctcaatatcaatttttgg
cggatcaataatgagccaacagtaattggaggacatcctgtcgactcaggatgtttgatt
gcttcccaattgagtgctcttcatacaaatgagaaaatctttgaaaatcctgagaaattc

two Adjust the parameters.

One or more sequences in fasta format database
Choose the pool of genes which the oligonucleotide might encounter during the hybridization experiment, if the target gene is from C. elegans choose c_elegans. The oligonucleotide will be selected so as only to match the target gene and none of the other genes. The databases that are available are listed here:

identifier # sequences # nucleotides updated content
h_sap_cdna 34.019 48.122.697 2002-01-22 All human cdna’s from Ensemble
h_sap_gs_cdna 256.400 149.005.312 2002-01-30 All human cdna’s including cdna’s predicted with genscan from Ensemble
h_sapiens 171.921 3.236.208.880 2001-12-14 The complete human genome from ensembl-1.1.0_golden_path.
c_elegans 6 100.258.522 2001-12-14 The complete Caernorhabditis elegans genome sequence (six chromosomes)

One or more sequences in fasta format length of the oligonucleotide
Specify the length of the oligonucleotide, the length of LNA spiked oligonucleotides can be choosen shorter than for oligonucleotides witout LNA because of LNA will increase the energi of the hybridization. A typical length could be around 40 nt.

One or more sequences in fasta format LNA frequency
Specify the frequency of LNA spiked nucleotides in the oligonucleotide. A value of 3 will replace every third nucleotide with an LNA analog. A value of 0 will result in oligonucleotides without LNA spiking, this is not recomended because the hybridization properties of the oligonucleotide will suffer, from the lack of LNA. e.g.

AcgTgcGcgGta

One or more sequences in fasta format LNA phase
Specify the phase of the LNA spiked nucleotides in the oligonucleotide. A value of 0 will spike the first nucleotide of the oligonucleotide with LNA, and then continue the spiking according to the frequency. e.g.

0: AcgTgcGcgGta
1: aCgtGcgCggTa

One or more sequences in fasta format LNA end length
This number will spike the ends of the oligonucleotide with as many LNAs. e.g.

3: ACGtgcgcgGTA

One or more sequences in fasta format minimum distance between the oligonucleotides
Specify the minimum distance between neighbour oligonucleotides. This option is usefull if more than one oligonucleotide is used for a single target gene, to prevent the oligonucleotides from overlapping, or to ensure that they target different parts of the gene.

One or more sequences in fasta format blast score cutoff
Select the blast score cutoff.

One or more sequences in fasta format blast word length
Select the wordlen used by blast.

One or more sequences in fasta format search strand
Select the which strand’s to search, for cdna, the direct strand will be sufficient.

One or more sequences in fasta format blast expectation cutoff
Select the expectation cutoff used by blast.

One or more sequences in fasta format min tm
Select the minimal melting temperature desired for the oligonucleotides.

One or more sequences in fasta format max tm
Select the maximal melting temperature desired for the oligonucleotides.

three Submit the job to the oligonucleotide design system

One or more sequences in fasta format Hit the submit button. The processing of the sequence will take a few seconds and up to a few minutes depending on the length of the sequence and the load of the server. Hit the refresh button to see if the result is ready, or wait until the page auto updates.

 The proposed optimal oligonucleotide.
One or more sequences in fasta format The oligonucleotide.
The sequence of the optimal oligonucleotide is shown, nucleotides that have been spiked with LNA are shown with capital letters. Below the oligonucleotide information regarding the quality of the oligonucleotide are given, it is recomended to check this information before accepting the oligonucleotide. The oligonucleotide information might look like this:

AaaTgcAacAatTgcGgaAgtAcaAcgAca
 
seqno: 1  oligono: 1
 score=0.43 , tm=0
 start=62, end=92

The first line gives the sequence of the oligonucleotide. The seqno is the sequence number (always 1). The oligono is the oligonucleotide number, 1 is the first and best oligonucleotide, 2 is the second best and so on. The score is a number between 0 and 1, if all selection criteria are perfectly fullfilled, the score will be close to one. Tm is the melting temperature of the oligonucleotide as found with One or more sequences in fasta format Show other hits.
The score indicates if there is any crosshybridization to other genes in the genome, if there is no cross hybridization it is close to one. The detailed view apears by checking the check box and pressing the refresh button. Possible cross hybridizations are shown. In the example below the cross hybridizations are weak and not likely to pose any problem.

AaaTgcAacAatTgcGgaAgtAcaAcgAca
 |||||||  ||||  ||| | ||||| || 22 matches
gaatgcaatgattggagaaattcaacgtca CHROMOSOME_V
||||||||||                     10 matches
aaatgcaaca                     CHROMOSOME_X
||||||||||                     10 matches
aaatgcaaca                     CHROMOSOME_X
|||                             3 matches
aaa                            CHROMOSOME_X

One or more sequences in fasta format Show the oligonucleotide secondary structure.
Possible secondary structure of the oligonucleotide is shown here, a score close to one indicates that the secondary structure is of little consequence.

AaaTgcAacAatTgcGgaAgtAcaAcgAca
((.((   )).))                 

Matching parenthesis show nucleotides that might bind to each other, dots show non canonical bindings.

One or more sequences in fasta format Show structure of the target sequence.
This score is low if the oligonucleotide matches within a strong secondary structure of the target gene. Parts of the target gene that are in strong secondary structures are marked with #

One or more sequences in fasta format Show details about the score.
The score indicates to wich extend the oligonucleotide confirms to the selection criteria, if it fits all criteria the score will be close to one. The score is found by combining the scores of the individual criteria in a neural network weighting sheme.

oligonucleotide score :  0.43
 max_match          22 score= 0.97 x   1 (    30     5  0.9)
 max_stretch        10 score= 1.00 x   1 (    20     2  0.9)
 self_hyp           14 score= 0.92 x   1 (    25    10  0.9)
 target_struct       0 score= 0.96 x   1 (    30    20  0.9)
 tm                  0 score= 0.00 x   5 (     2   0.1  0.9)
 tm_max              0 score= 0.00 x   1 (    95     2  0.9)
 tm_min              0 score= 0.00 x   1 (    40     2  0.9)

The first line gives the combined score of the oligonucleotide. The max_match line gives the maximal number of matching nucleotides found, the score, the weight of the score, the threshold, and two numbers indicating the slope of the threshold. The max_stretch is the longest continuous number of matching nucleotides in the genome. The self hybridization score indicates the entalpy change made by the secondary structure in the oligonucleotide. The target_struct gives the how many nucleotides of the oligonucleotide lie within a strong secondary structure of the target gene. There are three tm indicators, tm_min and tm_max give the tm of the probe and a score indicating the distance to the threshold, the tm value is the combined score.

One or more sequences in fasta format Select alternative oligonucleotide.
The oligonucleotide with the highest score is shown, alternative oligonucleotides can be selected from the list.

Related posts

Leave a Comment