Publications
2024
-
(2024) Transcription. Abstract
Transcription factors (TFs) intricately navigate the vast genomic landscape to locate and bind specific DNA sequences for the regulation of gene expression programs. These interactions occur within a dynamic cellular environment, where both DNA and TF proteins experience continual chemical and structural perturbations, including epigenetic modifications, DNA damage, mechanical stress, and post-translational modifications (PTMs). While many of these factors impact TF-DNA binding interactions, understanding their effects remains challenging and incomplete. This review explores the existing literature on these dynamic changes and their potential impact on TF-DNA interactions.
-
Site-Specific Acetylation of the Transcription Factor Protein Max Modulates Its DNA Binding Activity(2024) ACS Central Science. Abstract
Chemical protein synthesis provides a powerful means to prepare novel modified proteins with precision down to the atomic level, enabling an unprecedented opportunity to understand fundamental biological processes. Of particular interest is the process of gene expression, orchestrated through the interactions between transcription factors (TFs) and DNA. Here, we combined chemical protein synthesis and high-throughput screening technology to decipher the role of post-translational modifications (PTMs), e.g., Lys-acetylation on the DNA binding activity of Max TF. We synthesized a focused library of singly, doubly, and triply modified Max variants including site-specifically acetylated and fluorescently tagged analogs. The resulting synthetic analogs were employed to decipher the molecular role of Lys-acetylation on the DNA binding activity and sequence specificity of Max. We provide evidence that the acetylation sites at Lys-31 and Lys-57 significantly inhibit the DNA binding activity of Max. Furthermore, by utilizing high-throughput binding measurements, we assessed the binding activities of the modified Max variants across diverse DNA sequences. Our results indicate that acetylation marks can alter the binding specificities of Max toward certain sequences flanking its consensus binding sites. Our work provides insight into the hidden molecular code of PTM-TFs and DNA interactions, paving the way to interpret gene expression regulation programs.
2023
-
(2023) Angewandte Chemie - International Edition. 62, 47, e202310913. Abstract
The chemical synthesis of site-specifically modified transcription factors (TFs) is a powerful method to investigate how post-translational modifications (PTMs) influence TF-DNA interactions and impact gene expression. Among these TFs, Max plays a pivotal role in controlling the expression of 15 % of the genome. The activity of Max is regulated by PTMs; Ser-phosphorylation at the N-terminus is considered one of the key regulatory mechanisms. In this study, we developed a practical synthetic strategy to prepare homogeneous full-length Max for the first time, to explore the impact of Max phosphorylation. We prepared a focused library of eight Max variants, with distinct modification patterns, including mono-phosphorylated, and doubly phosphorylated analogues at Ser2/Ser11 as well as fluorescently labeled variants through native chemical ligation. Through comprehensive DNA binding analyses, we discovered that the phosphorylation position plays a crucial role in the DNA-binding activity of Max. Furthermore, in vitro high-throughput analysis using DNA microarrays revealed that the N-terminus phosphorylation pattern does not interfere with the DNA sequence specificity of Max. Our work provides insights into the regulatory role of Maxs phosphorylation on the DNA interactions and sequence specificity, shedding light on how PTMs influence TF function.
-
(2023) Science (New York, N.Y.). 381, 6664, p. eadd1250 Abstract
Short tandem repeats (STRs) are enriched in eukaryotic cis-regulatory elements and alter gene expression, yet how they regulate transcription remains unknown. We found that STRs modulate transcription factor (TF)-DNA affinities and apparent on-rates by about 70-fold by directly binding TF DNA-binding domains, with energetic impacts exceeding many consensus motif mutations. STRs maximize the number of weakly preferred microstates near target sites, thereby increasing TF density, with impacts well predicted by statistical mechanics. Confirming that STRs also affect TF binding in cells, neural networks trained only on in vivo occupancies predicted effects identical to those observed in vitro. Approximately 90% of TFs preferentially bound STRs that need not resemble known motifs, providing a cis-regulatory mechanism to target TFs to genomic sites.
-
(2023) Proceedings of the National Academy of Sciences of the United States of America. 120, 11, e221742212. Abstract
Somatic mutations are highly enriched at transcription factor (TF) binding sites, with the strongest trend being observed for ultraviolet light (UV)-induced mutations in melanomas. One of the main mechanisms proposed for this hypermutation pattern is the inefficient repair of UV lesions within TF-binding sites, caused by competition between TFs bound to these lesions and the DNA repair proteins that must recognize the lesions to initiate repair. However, TF binding to UV-irradiated DNA is poorly characterized, and it is unclear whether TFs maintain specificity for their DNA sites after UV exposure. We developed UV-Bind, a high-throughput approach to investigate the impact of UV irradiation on protein-DNA binding specificity. We applied UV-Bind to ten TFs from eight structural families, and found that UV lesions significantly altered the DNA-binding preferences of all the TFs tested. The main effect was a decrease in binding specificity, but the precise effects and their magnitude differ across factors. Importantly, we found that despite the overall reduction in DNA-binding specificity in the presence of UV lesions, TFs can still compete with repair proteins for lesion recognition, in a manner consistent with their specificity for UV-irradiated DNA. In addition, for a subset of TFs, we identified a surprising but reproducible effect at certain nonconsensus DNA sequences, where UV irradiation leads to a high increase in the level of TF binding. These changes in DNA-binding specificity after UV irradiation, at both consensus and nonconsensus sites, have important implications for the regulatory and mutagenic roles of TFs in the cell.
2020
-
(2020) Nature. 587, 7833, p. 291-296 Abstract
Transcription factors recognize specific genomic sequences to regulate complex gene-expression programs. Although it is well-established that transcription factors bind to specific DNA sequences using a combination of base readout and shape recognition, some fundamental aspects of proteinDNA binding remain poorly understood1,2. Many DNA-binding proteins induce changes in the structure of the DNA outside the intrinsic B-DNA envelope. However, how the energetic cost that is associated with distorting the DNA contributes to recognition has proven difficult to study, because the distorted DNA exists in low abundance in the unbound ensemble39. Here we use a high-throughput assay that we term SaMBA (saturation mismatch-binding assay) to investigate the role of DNA conformational penalties in transcription factorDNA recognition. In SaMBA, mismatched base pairs are introduced to pre-induce structural distortions in the DNA that are much larger than those induced by changes in the WatsonCrick sequence. Notably, approximately 10% of mismatches increased transcription factor binding, and for each of the 22 transcription factors that were examined, at least one mismatch was found that increased the binding affinity. Mismatches also converted non-specific sites into high-affinity sites, and high-affinity sites into super sites that exhibit stronger affinity than any known canonical binding site. Determination of high-resolution X-ray structures, combined with nuclear magnetic resonance measurements and structural analyses, showed that many of the DNA mismatches that increase binding induce distortions that are similar to those induced by protein bindingthus prepaying some of the energetic cost incurred from deforming the DNA. Our work indicates that conformational penalties are a major determinant of proteinDNA recognition, and reveals mechanisms by which mismatches can recruit transcription factors and thus modulate replication and repair activities in the cell10,11.
-
(2020) Biomolecules. 10, 9, p. 1-22 1299. Abstract
In the process of transcription initiation by RNA polymerase, promoter DNA sequences affect multiple reaction pathways determining the productivity of transcription. However, the question of how the molecular mechanism of transcription initiation depends on the sequence properties of promoter DNA remains poorly understood. Here, combining the statistical mechanical approach with high-throughput sequencing results, we characterize abortive transcription and pausing during transcription initiation byEscherichia coliRNA polymerase at a genome-wide level. Our results suggest that initially transcribed sequences, when enriched with thymine bases, contain the signal for inducing abortive transcription, whereas certain repetitive sequence elements embedded in promoter regions constitute the signal for inducing pausing. Both signals decrease the productivity of transcription initiation. Based on solution NMR and in vitro transcription measurements, we suggest that repetitive sequence elements within the promoter DNA modulate the nonlocal base pair stability of its double-stranded form. This stability profoundly influences the reaction coordinates of the productive initiation via pausing.
2019
-
(2019) JoVE journal. 2019, 152, e59737. Abstract
DNA primase synthesizes short RNA primers that initiate DNA synthesis of Okazaki fragments on the lagging strand by DNA polymerase during DNA replication. The binding of prokaryotic DnaG-like primases to DNA occurs at a specific trinucleotide recognition sequence. It is a pivotal step in the formation of Okazaki fragments. Conventional biochemical tools that are used to determine the DNA recognition sequence of DNA primase provide only limited information. Using a high-throughput microarray-based binding assay and consecutive biochemical analyses, it has been shown that 1) the specific binding context (flanking sequences of the recognition site) influences the binding strength of the DNA primase to its template DNA, and 2) stronger binding of primase to the DNA yields longer RNA primers, indicating higher processivity of the enzyme. This method combines PBM and primase activity assay and is designated as high-throughput primase profiling (HTPP), and it allows characterization of specific sequence recognition by DNA primase in unprecedented time and scalability.
-
(2019) Biochimica et Biophysica Acta - General Subjects. 1863, 9, p. 1343-1350 Abstract
The signal transducer and activator of transcription 3 (STAT3) protein is activated by phosphorylation of a specific tyrosine residue (Tyr705) in response to various extracellular signals. STAT3 activity was also found to be regulated by acetylation of Lys685. However, the molecular mechanism by which Lys685 acetylation affects the transcriptional activity of STAT3 remains elusive. By genetically encoding the co-translational incorporation of acetyl-lysine into position Lys685 and co-expression of STAT3 with the Elk receptor tyrosine kinase, we were able to characterize site-specifically acetylated, and simultaneously acetylated and phosphorylated STAT3. We measured the effect of acetylation on the crystal structure, and DNA binding affinity and specificity of Tyr705-phosphorylated and non-phosphorylated STAT3. In addition, we monitored the deacetylation of acetylated Lys685 by reconstituting the mammalian enzymatic deacetylation reaction in live bacteria. Surprisingly, we found that acetylation, per se, had no effect on the crystal structure, and DNA binding affinity or specificity of STAT3, implying that the previously observed acetylation-dependent transcriptional activity of STAT3 involves an additional cellular component. In addition, we discovered that Tyr705-phosphorylation protects Lys685 from deacetylation in bacteria, providing a new possible explanation for the observed correlation between STAT3 activity and Lys685 acetylation.
-
QBiC-Pred: Quantitative predictions of transcription factor binding changes due to sequence variants(2019) Nucleic Acids Research. 47, W1, p. W127-W135 Abstract
Non-coding genetic variants/mutations can play functional roles in the cell by disrupting regulatory interactions between transcription factors (TFs) and their genomic target sites. For most human TFs, a myriad of DNA-binding models are available and could be used to predict the effects of DNA mutations on TF binding. However, information on the quality of these models is scarce, making it hard to evaluate the statistical significance of predicted binding changes. Here, we present QBiC-Pred, a web server for predicting quantitative TF binding changes due to nucleotide variants. QBiC-Pred uses regression models of TF binding specificity trained on high-throughput in vitro data. The training is done using ordinary least squares (OLS), and we leverage distributional results associated with OLS estimation to compute, for each predicted change in TF binding, a P-value reflecting our confidence in the predicted effect. We show that OLS models are accurate in predicting the effects of mutations on TF binding in vitro and in vivo, outperforming widely-used PWM models as well as recently developed deep learning models of specificity. QBiC-Pred takes as input mutation datasets in several formats, and it allows post-processing of the results through a user-friendly web interface. QBiC-Pred is freely available at http://qbic.genome.duke.edu.
2018
-
(2018) Neurogenetics. 19, 3, p. 135-144 Abstract
Short structural variantsvariants other than single nucleotide polymorphismsare hypothesized to contribute to many complex diseases, possibly by modulating gene expression. However, the molecular mechanisms by which noncoding short structural variants exert their effects on gene regulation have not been discovered. Here, we study simple sequence repeats (SSRs), a common class of short structural variants. Previously, we showed that repetitive sequences can directly influence the binding of transcription factors to their proximate recognition sites, a mechanism we termed non-consensus binding. In this study, we focus on the SSR termed Rep1, which was associated with Parkinsons disease (PD) and has been implicated in the cis-regulation of the PD-risk SNCA gene. We show that Rep1 acts via the non-consensus binding mechanism to affect the binding of transcription factors from the GATA and ELK families to their specific sites located right next to the Rep1 repeat. Next, we performed an expression analysis to further our understanding regarding the GATA and ELK family members that are potentially relevant for SNCA transcriptional regulation in health and disease. Our analysis indicates a potential role for GATA2, consistent with previous reports. Our study proposes non-consensus transcription factor binding as a potential mechanism through which noncoding repeat variants could exert their pathogenic effects by regulating gene expression.
-
(2018) iScience. 2, p. 141-147 Abstract
Primases are key enzymes involved in DNA replication. They act on single-stranded DNA and catalyze the synthesis of short RNA primers used by DNA polymerases. Here, we investigate the DNA binding and activity of the bacteriophage T7 primase using a new workflow called high-throughput primase profiling (HTPP). Using a unique combination of high-throughput binding assays and biochemical analyses, HTPP reveals a complex landscape of binding specificity and functional activity for the T7 primase, determined by sequences flanking the primase recognition site. We identified specific features, such as G/T-rich flanks, which increase primase-DNA binding up to 10-fold and, surprisingly, also increase the length of newly formed RNA (up to 3-fold). To our knowledge, variability in primer length has not been reported for this primase. We expect that applying HTPP to additional enzymes will reveal new insights into the effects of DNA sequence composition on the DNA recognition and functional activity of primases.
2016
-
(2016) Proceedings of the National Academy of Sciences of the United States of America. 113, 47, p. E7409-E7417 Abstract
In the process of transcription elongation, RNA polymerase (RNAP) pauses at highly nonrandom positions across genomic DNA, broadly regulating transcription; however, molecular mechanisms responsible for the recognition of such pausing positions remain poorly understood. Here, using a combination of statistical mechanical modeling and high-throughput sequencing and biochemical data, we evaluate the effect of thermal fluctuations on the regulation of RNAP pausing. We demonstrate that diffusive backtracking of RNAP, which is biased by repetitive DNA sequence elements, causes transcriptional pausing. This effect stems from the increased microscopic heterogeneity of an elongation complex, and thus is entropydominated. This report shows a linkage between repetitive sequence elements encoded in the genome and regulation of RNAP pausing driven by thermal fluctuations.
2015
-
(2015) PLoS Computational Biology. 11, 8, e1004429. Abstract
Recent genome-wide experiments in different eukaryotic genomes provide an unprecedented view of transcription factor (TF) binding locations and of nucleosome occupancy. These experiments revealed that a large fraction of TF binding events occur in regions where only a small number of specific TF binding sites (TFBSs) have been detected. Furthermore, in vitro protein-DNA binding measurements performed for hundreds of TFs indicate that TFs are bound with wide range of affinities to different DNA sequences that lack known consensus motifs. These observations have thus challenged the classical picture of specific protein-DNA binding and strongly suggest the existence of additional recognition mechanisms that affect protein-DNA binding preferences. We have previously demonstrated that repetitive DNA sequence elements characterized by certain symmetries statistically affect protein-DNA binding preferences. We call this binding mechanism nonconsensus protein-DNA binding in order to emphasize the point that specific consensus TFBSs do not contribute to this effect. In this paper, using the simple statistical mechanics model developed previously, we calculate the nonconsensus protein-DNA binding free energy for the entire C. elegans and D. melanogaster genomes. Using the available chromatin immunoprecipitation followed by sequencing (ChIP-seq) results on TF-DNA binding preferences for ~100 TFs, we show that DNA sequences characterized by low predicted free energy of nonconsensus binding have statistically higher experimental TF occupancy and lower nucleosome occupancy than sequences characterized by high free energy of nonconsensus binding. This is in agreement with our previous analysis performed for the yeast genome. We suggest therefore that nonconsensus protein-DNA binding assists the formation of nucleosome-free regions, as TFs outcompete nucleosomes at genomic locations with enhanced nonconsensus binding. In addition, here we perform a new, large-scale analysis using in vitro TF-DNA preferences obtained from the universal protein binding microarrays (PBM) for ~90 eukaryotic TFs belonging to 22 different DNA-binding domain types. As a result of this new analysis, we conclude that nonconsensus protein-DNA binding is a widespread phenomenon that significantly affects protein-DNA binding preferences and need not require the presence of consensus (specific) TFBSs in order to achieve genome-wide TF-DNA binding specificity.
2014
-
(2014) Proceedings of the National Academy of Sciences of the United States of America. 111, 48, p. 17140-17145 Abstract
Until now, it has been reasonably assumed that specific base-pair recognition is the only mechanism controlling the specificity of transcription factor (TF)-DNA binding. Contrary to this assumption, here we show that nonspecific DNA sequences possessing certain repeat symmetries, when present outside of specific TF binding sites (TFBSs), statistically control TF -DNA binding preferences. We used highthroughput protein-DNAbinding assays to measure the binding levels and free energies of binding for several humanTFs to tens of thousands of short DNA sequences with varying repeat symmetries. Based on statisticalmechanicsmodeling, weidentifyanewprotein-DNAbinding mechanism induced by DNA sequence symmetry in the absence of specific base-pair recognition, and experimentally demonstrate that this mechanism indeed governs protein-DNA binding preferences.
2013
-
(2013) Biophysical Journal. 105, 7, p. 1653-1660 Abstract
Recent experiments provide an unprecedented view of protein-DNA binding in yeast and human genomes at single-nucleotide resolution. These measurements, performed over large cell populations, show quite generally that sequence-specific transcription regulators with well-defined protein-DNA consensus motifs bind only a fraction among all consensus motifs present in the genome. Alternatively, proteins in vivo often bind DNA regions lacking known consensus sequences. The rules determining whether a consensus motif is functional remain incompletely understood. Here we predict that genomic background surrounding specific protein-DNA binding motifs statistically modulates the binding of sequence-specific transcription regulators to these motifs. In particular, we show that nonconsensus protein-DNA binding in yeast is statistically enhanced, on average, around functional Reb1 motifs that are bound as compared to nonfunctional Reb1 motifs that are unbound. The landscape of nonconsensus protein-DNA binding around functional CTCF motifs in human demonstrates a more complex behavior. In particular, human genomic regions characterized by the highest CTCF occupancy, show statistically reduced level of nonconsensus protein-DNA binding. Our findings suggest that nonconsensus protein-DNA binding is fine-tuned around functional binding sites using a variety of design strategies.
-
(2013) Biophysical Journal. 104, 5, p. 1107-1115 Abstract
Genome-wide binding preferences of the key components of eukaryotic preinitiation complex (PIC) have been recently measured at high resolution in Saccharomyces cerevisiae by Rhee and Pugh. However, the rules determining the PIC binding specificity remain poorly understood. In this study, we show that nonconsensus protein-DNA binding significantly influences PIC binding preferences. We estimate that such nonconsensus binding contributes statistically at least 2-3 kcal/mol (on average) of additional attractive free energy per protein per core-promoter region. The predicted attractive effect is particularly strong at repeated poly(dA:dT) and poly(dC:dG) tracts. Overall, the computed free-energy landscape of nonconsensus protein-DNA binding shows strong correlation with the measured genome-wide PIC occupancy. Remarkably, statistical PIC preferences of binding to both TFIID-dominated and SAGA-dominated genes correlate with the nonconsensus free-energy landscape, yet these two groups of genes are distinguishable based on the average free-energy profiles. We suggest that the predicted nonconsensus binding mechanism provides a genome-wide background for specific promoter elements, such as transcription-factor binding sites, TATA-like elements, and specific binding of the PIC components to nucleosomes. We also show that nonconsensus binding has genome-wide influence on transcriptional frequency.
2012
-
(2012) Biophysical Journal. 102, 8, p. 1881-1888 Abstract
Recent genome-wide measurements of binding preferences of ∼200 transcription regulators in the vicinity of transcription start sites in yeast, have provided a unique insight into the cis-regulatory code of a eukaryotic genome. Here, we show that nonspecific transcription factor (TF)-DNA binding significantly influences binding preferences of the majority of transcription regulators in promoter regions of the yeast genome. We show that promoters of SAGA-dominated and TFIID-dominated genes can be statistically distinguished based on the landscape of nonspecific protein-DNA binding free energy. In particular, we predict that promoters of SAGA-dominated genes possess wider regions of reduced free energy compared to promoters of TFIID-dominated genes. We also show that specific and nonspecific TF-DNA binding are functionally linked and cooperatively influence gene expression in yeast. Our results suggest that nonspecific TF-DNA binding is intrinsically encoded into the yeast genome, and it may play a more important role in transcriptional regulation than previously thought.
2011
-
(2011) Biophysical Journal. 101, 10, p. 2465-2475 Abstract
Quantitative understanding of the principles regulating nucleosome occupancy on a genome-wide level is a central issue in eukaryotic genomics. Here, we address this question using budding yeast, Saccharomyces cerevisiae, as a model organism. We perform a genome-wide computational analysis of the nonspecific transcription factor (TF)-DNA binding free-energy landscape and compare this landscape with experimentally determined nucleosome-binding preferences. We show that DNA regions with enhanced nonspecific TF-DNA binding are statistically significantly depleted of nucleosomes. We suggest therefore that the competition between TFs with histones for nonspecific binding to genomic sequences might be an important mechanism influencing nucleosome-binding preferences in vivo. We also predict that poly(dA:dT) and poly(dC:dG) tracts represent genomic elements with the strongest propensity for nonspecific TF-DNA binding, thus allowing TFs to outcompete nucleosomes at these elements. Our results suggest that nonspecific TF-DNA binding might provide a barrier for statistical positioning of nucleosomes throughout the yeast genome. We predict that the strength of this barrier increases with the concentration of DNA binding proteins in a cell. We discuss the connection of the proposed mechanism with the recently discovered pathway of active nucleosome reconstitution.
-
(2011) Journal of Chemical Physics. 135, 6, 065104. Abstract
We predict analytically that diagonal correlations of amino acid positions within protein sequences statistically enhance protein propensity for nonspecific binding. We use the term promiscuity to describe such nonspecific binding. Diagonal correlations represent statistically significant repeats of sequence patterns where amino acids of the same type are clustered together. The predicted effect is qualitatively robust with respect to the form of the microscopic interaction potentials and the average amino acid composition. Our analytical results provide an explanation for the enhanced diagonal correlations observed in hubs of eukaryotic organismal proteomes.
-
(2011) Journal of Molecular Biology. 409, 3, p. 439-449 Abstract
Numerous experiments demonstrate a high level of promiscuity and structural disorder in organismal proteomes. Here, we ask the question what makes a protein promiscuous, that is, prone to nonspecific interactions, and structurally disordered. We predict that multi-scale correlations of amino acid positions within protein sequences statistically enhance the propensity for promiscuous intra- and inter-protein binding. We show that sequence correlations between amino acids of the same type are statistically enhanced in structurally disordered proteins and in hubs of organismal proteomes. We also show that structurally disordered proteins possess a significantly higher degree of sequence order than structurally ordered proteins. We develop an analytical theory for this effect and predict the robustness of our conclusions with respect to the amino acid composition and the form of the microscopic potential between the interacting sequences. Our findings have implications for understanding molecular mechanisms of protein aggregation diseases induced by the extension of sequence repeats.