Engelgardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
Differential codon-usage varies significantly in different organisms, tissues and even genes. It is known, that three positions in reading frame tend to be occupied with different nucleotides. One of the most prominent biases is the preference in G occurrence on the first position of a reading frame. Several ideas have been drawn in order to explain this preference. Among others, properties of translation process, and evolutional arguments both on DNA and protein level have been suggested. However, up till now there is no generally accepted explanation of this phenomenon. We think that the study of binary correlation in nucleotide content for three reading frame position may bring some understanding into this problem. This correlation can supply information about nucleotide substitutions which lead to the bias. However, we are not going to discuss the selection factors affecting the probability of such substitutions.
We observed statistical correlation between nucleotide contents of three reading frame positions, which take place in a number of divergent organisms. Among the most prominent of these correlations are negative correlations between G and T contents and A and C contents correspondingly, which are found in genes from different species regardless of their GC-content and gene expression level. In the genes of higher eukaryots positive correlations between A and T contents, and G and C contents are found. The origin of such correlations may conceal in features of the mutation process or the translation mechanism. This effect is also connected with amino-acid content of proteins. Correlations may be also taken into consideration in homology search for example in construction of matrices of similarity for nucleotides.