Lab. of Computer and Structural Analysis of Bioploymers, V. A.. Engelhardt Institute of Molecular Biology, Moscow, Russia
We have undertaken an exhaustive statistical analysis of the amino acid
sequences at the C-terminal ends of proteins and of nucleic acid sequences
at the 5' side of the stop codons. Even for a relatively limited set of
protein sequences from E.coli, it appeared to be possible to observe the
over-representation of Lys in the (-1) position. In mammalian proteins, Lys
and Cys are over-represented ( Table 1 ) mainly at the C-terminal region.
For the E.coli and human 3'-terminal coding nucleotide sequences and deduced
C-terminal peptides, we have revealed a prominent bias in amino acid
frequencies. For E.coli, in the (-1-8) positions positively charged amino
acids are preferred, while for humans, Lys, Arg, Cys, Ser, Glu, and Phe are
over-represented within the C-terminal nonamer. Under-represented amino acids
were mostly non-polar and detected predominantly at the (-1) position.
The local C-terminal bias revealed proteins from prokaryotes up to higher
organisms prompted us to apply the approach for other evolutionary groups,
as yeast and A.bacteria proteins. Despite the significant limitation for the
number of A.bacteria sequences, we have obtained over-representation Lys, Arg,
Glu, and Gln at the (-1-4) C-end peptide. An analysis of the set of yeast
sequences clearly exhibits the extended region of over-representation mainly
for Lys, Arg, and Glu .
In summary, we assume that the bias in the amino acid composition of the C-terminal is related to the such factor as stabilization of the protein globule and fixation of the C-terminal peptides via ionic, H-bond and S-S contacts rather than modulation of the translation termination via prevailance of the polar amino acids coupled with deficiency for non-polar amino acids that act at the (-1) and (-2) positions.