(1) Centro de Investigación sobre Fijación de Nitrógeno, UNAM. Ap. Postal
565-A. 62210 Cuernavaca, Morelos
(2) University of California, Los
Angeles, Los Angeles, California 90024-1569
We began an analysis of regulatory proteins in an attempt to reveal the circuitry of transcriptional control, and to develop methods based on the information-rich bacterial systems and apply them to the more complex eukaryotic systems. Bacterial regulatory protein sequences in the Swissprot data base were analyzed with regard to the location of their helix-turn-helix (HTH) DNA-binding domains. We found out functional groups of proteins largely segregated on the basis of the location of their DNA-binding domains. Most repressors have such domains located near their NH2-terminus whereas most activators have their domains located near their COOH-terminus. The large LysR family of dual regulators (many of them are negatively autoregulated) form a distinct group with a repressor-like location. There is no apparent functional reason why repressor proteins should have NH2-terminal DNA-binding domains whereas activators should bind COOH-terminally, suggesting that the groupings are related to evolutionary issues.
Encouraged by this result we began to apply it to the vast number of eukaryotic regulatory protein sequences that exist in genomic data bases, for example, we obtained the homeobox domains in 223 proteins from the Swissprot data base. As in the prokaryotic case, the analysis shows that location is strongly preserved in this large group of proteins. Approximately 1/3 of proteins in this very diverse group have their HTH domain located within 10 amino acids of position 50 from the COOH-terminus. The analysis was also applied to the 140 double zinc finger proteins in the data base. These form a very diverse group with regard to function. The analysis shows that the location of the double zinc fingers is non-random and is quite different from the preferred location of homeodomains, typically being in the NH2-terminal region. Because there is no apparent functional reason for these preferred locations, we believe that the proteins likely fall into a very small number of evolutionary groups, despite their diverse functions. As in the prokaryotic case, the analysis suggests that DNA-binding domains appeared very early in the evolution of these many regulatory proteins and diversity was built onto this early base.
References
1. Bairoch and Apweiler, Nucleic Acids Res. 24:17-20 (1996).