In vivo and in vitro consistency of thermodynamic models for transcription regulation

The ability of cells to respond to external stimuli, by up and down regulation of genes, is a key strategy for survival in changing environmental conditions. Models based on equilibrium statistical mechanics have been able to successfully predict in vivo fold changes in gene expression by computing weights of conﬁgurational states of the promoter where genes can be expressed. These models are based on the same, perhaps unintuitive, assumption that transcription initiation—an inherently nonequilibrium process—can indeed be effectively described by the equilibrium binding of transcription factors (TFs) and polymerases to the promoter. The few earlier studies that independently test this assumption [P. H. von Hippel et al. , Proc. Natl. Acad. Sci. USA 71 , 4808 (1974)], were published before much of our modern day understanding of molecular biology was established, and their models fail to explain more recent experimental results. As such, it is not obvious that the original conclusions remain valid. Equilibrium models depend on ﬁtted free energy differences between binding of TFs to speciﬁc operator sites versus nonspeciﬁc DNA. In this article we compare the ﬁtted binding free energy of the well-studied LacI repressor to equilibrium binding constants measured in independent invitro experiments. To make this comparison we take into account the distribution of binding energies of the transcription factor to the nonspeciﬁc DNA, and we adjust LacI binding constants to a common set of physiological conditions. We ﬁnd that the ﬁtted binding energies of the LacI repressor in vivo indeed agree with in vitro measured equilibrium binding constants, reestablishing the idea that equilibrium statistical mechanical models of transcriptional regulation should be viewed not merely as mathematical tools, but also as informative physical representations of underlying TF-DNA interactions.


I. INTRODUCTION
Half a century ago, von Hippel et al. [1] measured the binding constant of the transcription factor LacI to isolated DNA fragments on which the lac promoter of Escherichia coli was either present or not.They found that the ratio of the two equilibrium constants could reproduce fold changes in wildtype in vivo activity of the lac promoter, based on a simple equilibrium adsorption model built on the following premise: if the transcription factor is bound to the promoter, the gene stops transcribing.Equilibrium-based models have since been used successfully to quantitatively predict the transcriptional activity of many genes in the presence of transcription factors (TFs) [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18].
It is important to note that all these models are based on the same cornerstone: the assumption that transcription-an inherently nonequilibrium process-is regulated by equilibrium binding of transcription factors.The transcriptional activity of a gene is taken as proportional to the (equilibrium) probability that its promoter region is occupied by RNA polymerase (RNAP), an assumption which is justified when the formation of the RNAP open complex on the promoter site of a gene is slow in comparison to the binding and unbinding kinetics of TFs over the genome [19][20][21][22].Under these assumptions, equilibrium statistical mechanics is used to calculate the RNAP occupancy.The assumptions needed to treat transcription regulation as a quasiequilibrium process are subtle (see, e.g., Refs.[23,24] and the Supplemental Material [25]), and there exists a corresponding class of kinetic models which do not require as many assumptions, at the cost of requiring more parameters [26][27][28][29][30][31][32].Thermodynamic models can be used to fit a binding free energy of TFs to their DNA sites from in vivo experiments, which yield values that are internally consistent [10,14,16,17,[33][34][35]; that is, different experimental datasets obtained under different conditions can all be described by the same key quantities in a thermodynamic model.While internal consistency is a strong argument for the plausibility of a model, it does not provide a true verification that the model reflects the actual mechanism.It is far more likely that a model is grounded in reality when quantities have been verified by independent experiments, such as the determination of Avogadro's number [36], or the independent verification of many quantities in the standard model of particle physics [37].As it stands, the possibility remains that the fitted quantity is a kinetic parameter that is erroneously interpreted as an equilibrium binding free energy.
To our knowledge, the seminal work of von Hippel et al. [1] provides the only independent evidence that equilibrium binding free energies can be used to predict fold changes in transcriptional activity [38].However, knowledge of the statistical mechanics of TF binding, as well as of the architecture of the lac promoter has increased to such an extent that it is not a priori obvious that their conclusions survive.In particular, binding of the TF to non-specific DNA (the reference state) is still DNA sequence specific, so there is a distribution of binding energies.Moreover, the lac promoter used by von Hippel has multiple auxiliary binding sites for the transcription factor-which were not known at the time-which allows double, triple, or looped configurations.We will show in this article that the model of von Hippel et al. [1] is not able to reproduce more recent experiments because knowledge of the lac promoter was incomplete at the time.As such, an independent verification of the quantities in equilibrium thermodynamic models for transcription initiation is missing.
In this paper we aim to systematically and critically evaluate the idea that equilibrium binding of transcription factors is the mechanism behind the regulation of transcription initiation.To this end, we compare the free energy of TF binding, as fitted by thermodynamic equilibrium models, to multiple independent in vitro experiments.Here, we choose to focus on the binding of the lac repressor LacI of Escherichia coli-a well-studied model architecture with general applicabilityand vary the binding target on the DNA and thereby change the binding affinity of the TF to the substrate.The comparison needs to be independent: any quantitative experimental determinations should not be based on the assumption that fold changes in transcriptional activity are proportional to the equilibrium binding probability of RNAP to the promoter.Furthermore, the binding free energies that are found need to correspond to single sites only and not be complicated by cooperative binding of the transcription factor.This way, the binding free energies can be directly compared without introducing additional assumptions.Finally, we must be assured that binding free energies are compared to the appropriate reference energy, in both the thermodynamic model and the independent experiment.As a rule, interactions between the molecular building blocks of living cells are strongly influenced by their surroundings [39].For this reason, in order to independently determine the strength of a biological interaction, experiments in vivo would be preferred if they were available.However, in vivo measurements are limited to few well-designed experiments and not all parameters are accessible with current experimental methods.Direct comparison between in vivo and in vitro binding free energies is not a priori straightforward because of the presence of many different cellular components, as well as differences in pH, salt concentrations, and temperature.These differences are expected to significantly affect the affinity of TFs for their targets on DNA.However, the transcription factors that we consider have a very high affinity for DNA, even outside of their specific binding sites [1,40,41].Consequently, these transcription factors are hardly ever found in solution and are overwhelmingly more likely to be bound to nonspecific DNA.As such, the relevant reference state of the TFs considered here is not the solution state, but rather the situation where the transcription factor is bound to nonspecific sequences on DNA.As a consequence, the influence of the crowded cell environment is expected to cancel.
The affinity of transcription factors for DNA depends on sequence-specific interactions such as hydrogen bonding [42][43][44], and is therefore dependent on the local nucleotide sequence [13,45,46].As such, there is no single binding free energy for a transcription factor to nonspecific DNA.Rather, there is a distribution of binding free energies.In this article, we start by calculating a single effective binding free energy for a distribution of binding sites, inspired by previous work by von Hippel and Berg [47] and Gerland et al. [48].The effective binding free energy is related to the equilibrium constant and can be measured in vitro.Proceeding, we systematically gather LacI binding constants from many different in vitro equilibrium binding experiments in the literature, to both nonspecific DNA and to the four known operator sites, each with their own TF binding affinity.Taking care that the in vitro experiments were performed on single operator sites, we calculate the in vitro difference in LacI binding energy between specific and nonspecific DNA.In addition, we recalculate observed binding constants to a common set of conditions that are relevant in vivo.We find that the fitted energy differences from in vivo experiments using thermodynamic models closely match the independently measured in vitro binding free energy differences, providing a strong case that the quantity that governs transcriptional activity is indeed a true equilibrium binding free energy, and not an effective kinetic parameter.

II. METHODS
We will use the formalism of the grand canonical ensemble, a natural ensemble to work in when dealing with multichemical binding.We take the binding of TFs to different DNA sites as uncorrelated, similar to [34].
The observed (in vitro) binding constant of a protein binding to a DNA site is directly related to the binding free energy between these objects.We consider a protein P that binds to DNA sites D in the following equilibrium: with a binding constant where θ is the fraction of DNA sites occupied by P. The free concentration [P] is related to the chemical potential of P, μ, through the well-known relation (see, e.g., Ref. [49]) Here, x P is the mole fraction of P, which for dilute solutions is related to [P] by with v w the molecular volume of water.The implied volume scaling will later drop out and will not affect the final result, as we will eventually compare ratios of binding constants.

A. Identical DNA binding sites
If the protein P can bind (either in vivo or in vitro) to a specific site, s, we write down the grand canonical partition function of that site as with λ = e βμ the fugacity of a protein P that can adsorb to the lattice site, β = (k B T ) −1 , p the occupancy of the site, and Z (p) the relevant part of the canonical partition function.The first term corresponds to the state where the DNA binding site is free and is therefore given the weight 1.The second term corresponds to a state that has a single molecule of P adsorbed to the binding site.The relevant part of the canonical partition function for the occupied state is the Boltzmann exponent exp(−β s ), of the binding (free) energy s of the protein P to the (specific) binding site, s.A system of N independent copies of this binding site has a grand canonical partition function of N = N .We can obtain the occupancy θ from the partition function by taking the partial derivative with respect to λ.It follows that the occupancy θ s is given by the Langmuir isotherm Using this adsorption isotherm and the relation between chemical potential and protein concentration in (3), we can express the binding constant K s in Eq. (2) as The binding constant reflects the equilibrium between the protein P bound to its specific site on the DNA and P in solution.

B. Distribution of binding sites
The binding affinity of proteins P to nonspecific DNA varies with the sequence of the DNA.We consider a system of (nonspecific) DNA binding sites with a distribution in the binding free energy of P. The grand canonical partition function of adsorption onto a distribution of N ns binding sites is given by ns where λ = e βμ is again the fugacity of the TF, and i the binding free energy of site i.If we take the logarithm of this expression and isolate the factor N ns , the resulting sum ] can be interpreted as an ensemble average.Consequently, we can write log ns = N ns log(1 When the distribution is sufficiently narrow, around σ 2 k B T for biological relevant parameters (see Supplemental Material [50] and [51,52]), we can approximate (9) with log ns N ns log(1 + λ e −β ).(10) This approximation also makes the factor that needs to be averaged independent of TF fugacity.The average exp(−β ) can be expanded into the series where n is the nth raw moment of the distribution.This series is known as the moment-generating function of the distribution, M (−β ).It is convenient to introduce the cumulantgenerating function K (−β ) = log M (−β ).The cumulantgenerating function can also be expressed as a series expansion [53], ) where κ n is the nth cumulant of the distribution.The advantage of using the cumulant-generating function is that cumulants are directly related to observable quantities of the distribution, such as the mean , variance σ 2 , and skewness γ 1 .If we express (10) in terms of the cumulant-generating function, we obtain ns ) where we have also taken the exponent of both sides.We can define an effective energy as the sum of cumulants in the expansion so that the expression for the partition function in (13) becomes ns (1 The resulting partition function for a system with a distribution of binding sites is isomorphic to the partition function of a system with identical sites with a binding free energy equal to eff .The effective free energy is smaller (more negative) than the mean binding free energy of the distribution, since at finite temperatures the TFs favor binding at the lower energy sites.The effective energy greatly simplifies the adsorption isotherm corresponding to that system.The properties of the distributions are essentially condensed into a single parameter.Since in general we do not know the energy distribution of binding sites, the effective energy is a very useful quantity that implicitly carries the information of the distribution.
We can calculate the occupancy of the nonspecific sites, θ ns , from the grand canonical partition function using Eq. ( 6), With this adsorption isotherm, combined with the expression for the binding constant in Eq. ( 2), we write down the nonspecific equilibrium constant and we find an expression very similar to Eq. (7).By defining the effective energy, we have a reference point for the binding free energy of TFs to nonspecific DNA, which is less sensitive to offsets in free energy due to the presence of other solutes, and which can be used even when the distribution of binding free energies is unknown.Moreover, the difference between the specific binding free energy s and the effective binding free energy of TFs to nonspecific DNA can immediately be compared to the experiments from the ratio of the observed binding constants

III. RESULTS AND DISCUSSION
We obtained from literature the experimental binding constants from in vitro binding assays of the wild-type repressor transcription factor LacI (wild-type E. coli), to the operator sequences O1, O2, and O3, the symmetrical operator Oid, and nonspecific DNA.
Table I lists observed binding constants of LacI to E. coli nonspecific DNA.The binding constants have been measured at various ionic strengths.Here we modify binding constants to the value at a physiological salt concentration of 200 mM of NaCl, room temperature, and pH 7.5 (see Supplemental Material [54]).Applying the relation between observed binding constant and the binding free energy in Eq. ( 17), we can calculate the effective binding free energy eff of LacI to nonregulatory DNA.
In Table II we list observed binding constants of LacI to the lac operon wild-type operators and to the symmetric operator Oid.We recalculated the reported value of the bind- ing constant to physiological conditions.In the Supplemental Material [54] we provide more details on these corrections.
In the work of Garcia and Phillips [14] LacI binding energies were fit to in vivo measurements of E. coli under minimal growth conditions, using a thermodynamic model.We see the model in [14] as representative of the class of thermodynamic models for transcription regulation.We show the values they report in Table III, together with in vitro data calculated from the ratio of the binding constants according to Eq. (18). Figure 1 (blue) shows a graphic representation of these results.Especially for the stronger binding operator sites, the correspondence between in vivo and in vitro data is convincing, within 1 k B T .For the weaker binding O3 operator there is a larger mismatch between in vivo and in vitro data.This is likely a reflection of the scarcity of data published on the auxiliary operator, combined with a large uncertainty in determining low-affinity binding [69,70].

A. Earlier work
In the seminal work of von Hippel et al. [1], estimates from in vitro experiments for the binding constants of tetrameric LacI to its cognate site on the lac operon, as well as to nonspecific DNA, were used to calculate in vivo fold changes in TABLE III.Comparison between in vitro and in vivo binding free energies of LacI to operator sequences O1, O2, O3, and Oid, offset by the binding free energy to nonspecific DNA.The in vivo binding affinities were obtained by Garcia and Phillips [14] (row 2) and Vilar and Saiz [16] (rows 3 and 4).The binding free energies in row 3 were rescaled to the size of the nonspecific genome in row 4.
In  [14] (blue points) and Vilar and Saiz [16] [green points, recalculated according to Eq. ( 22)].The dotted line denotes the line x = y.Data are displayed as mean ± the standard error of the mean (SEM).transcriptional activity.They also found that the ratio between the in vitro binding constants is sufficient to quantitatively predict the basal transcription level of wild-type E. coli.This insight is an important step in the physical interpretation of thermodynamic models.However, with today's knowledge of the promoter architecture, it is not obvious that their conclusions are justified.
First, the effect of the auxiliary lac operators in the promoter neighborhood was unknown at the time of publication of von Hippel et al. [1].The presence of auxiliary operator sites allow conformations where the DNA is looped between two operator sites simultaneously bound to tetrameric LacI, with a significantly increased free energy.The in vitro binding constant used by von Hippel et al. [1] consequently is an effective parameter due to the presence of auxiliary sites.
The work in [1] also considered the nonspecific sites on the DNA as a set of identical sites, without taking into account the sequence-specific interaction between the TF and DNA.As we have shown here, a distribution with a finite variance causes a shift in the effective free energy of the reservoir, but the same shift also affects the in vitro (measured) binding constant.
In wild-type E. coli cells, only ∼10 copies of LacI exist.The in vivo measurements from Jacob and Monod [71] that was used by von Hippel et al. [1] were therefore also taken in this limit of low LacI fugacity.Figure 2 shows a comparison of the simple repression model used in [1] with a model taking into account the full promoter architecture.It is for the following reason that a match was found between the calculated and observed fold change in transcriptional activity: in the low LacI fugacity limit the behavior of the lac promoter can be approximated with a simple repression model, where LacI always binds in a looped conformation as shown in Fig. 2(b).However, such a model fails when the LacI copy number increases.Had the gene activity measurements used in [1] been measured at a higher LacI copy number, their model would have overestimated the degree of repression by up to an order of magnitude.
Essentially, the work of von Hippel et al. [1] was far ahead of its time: for the lac promoter architecture, transcriptional activity is regulated by the equilibrium binding of LacI.Reexamining their work almost half a century later, their conclusions are based on incomplete knowledge about the system.However, the systematic match between in vitro and in vivo binding free energy that can be seen from the FIG. 2. (a) Fold change of the wild-type lac promoter as function of LacI copy number R, compared to two different thermodynamic models.Data are from Jacob and Monod [71] and Oehler et al. [72,73].In blue we show our full model for the intact lac promoter architecture (see Landman et al. [74]), with operator binding free energies taken from Garcia and Phillips [14] and other quantities taken from thermodynamic fits to other data from [72,73].The green dotted line is the simple repression model used by von Hippel et al. [1] using the in vitro ratio of binding constants.The solid green line is the same model, but with the binding free energy calculated using the model in [74] (see the Supplemental Material for more details [75]).(b) Cartoon of the dominant conformation of LacI bound to the promoter architecture.literature published since shows that, at the core, the quantities in thermodynamic models can be verified from independent experiments.

B. DNA availability
An important point to address here is the influence of the available number of nonspecific sites in vivo.In the analysis of the in vivo measurements by Garcia and Phillips [14] it is assumed that the whole bacterial genome is (immediately) available for the transcription factors.However, nucleoid associated proteins and supercoiling of the genome are expected to influence the effective number of nonspecific sites.Fixing the number of nonspecific sites in a model will influence the effective binding free energy as extracted from experiments.Garcia and Phillips [14] fit fold-change data for a given number of repressor molecules to the (canonical) expression for the fold change of a gene regulated by a simple repressor [4,8]: with R is the number of repressor molecules and N ns the total number of base pairs in a cell.Now let us define N * ns N ns as the the number of nonspecific sites accessible for transcription factors, and not supercoiled or otherwise compacted by nucleoid-associated proteins.Taking into account an effective number of available sites implies we have to replace N ns by N * ns in (19).Not taking the effect into account leads to an error in the binding free energy δ s of where N * ns /N ns denotes the fraction of available sites.When the effective number of available sites is within a factor of 2-3 lower than the total number of sites, the error does not exceed the uncertainty range of the in vitro binding free energies.A difference in availability of a full order of magnitude should be noticeable as a reduction of the binding free energy of approximately 2k B T .This is relevant because DNA is usually significantly compacted [39,41], and TF binding to compacted DNA may be inhibited [76].

C. Nonspecific binding is the appropriate reference state
Vilar and Saiz [16] also report binding energies, based on fits to the experiments of Oehler et al. [72,73], but in contrast to the work of Garcia and Phillips [14], their model assumes that LacI is present in solution when not bound to its cognate site.Consequently, the binding free energies reported in their work differ significantly from both the binding free energies of [14] and the in vitro experiments (see Table III, row 3).Vilar and Saiz [16] use the experimentally determined relation that one molecule per cell corresponds to a cellular concentration of 1.5 nM, to express the number of repressor molecules in the cell given in [72,73] into units of concentration.They then use a similar expression to determine the binding free energy as in Eq. ( 19), Fold change [see Ref. [16], Eq. ( 11)], (21) with the factor [R] the concentration of LacI, divided implicitly by a reference concentration of 1 M, replacing R/N ns in (19).This substitution essentially rescales their result to a different reference state, introducing a shift in the binding free energy of with V cell the cell volume and N A Avogadro's constant.The factor 10 3 follows from converting the units of the reported dissociation constant from M to mol m −3 .Vilar and Saiz use the relation that 1.5 nM corresponds to one molecule per cell, fixing the size of the cell at V cell = (N A × 1.5 nM/molecule) −1 = 1.1 μm 3 .In Table III (row 4) and in Fig. 1 (green) we show the binding free energy of the operator sites of the lac operon, after recalculation to the size of the nonspecific genome.Contrary to the originally reported quantities, we see that there is a convincing match between the data from Vilar and Saiz [16] and Garcia and Phillips [14], and the in vitro data.This provides additional evidence that it is the nonspecific genome that acts as the relevant reference state for LacI.

IV. CONCLUSION
The power of thermodynamic equilibrium models for transcription regulation has already been extensively demonstrated in the existing literature.However, the quantity governing the fold change in transcriptional activity is interpreted in these models as an equilibrium free energy, and while its value is internally consistent between different thermodynamic equilibrium models, the possibility remains that it is a hidden kinetic parameter.The justification needed to treat transcription regulation as an equilibrium process has so far been provided by the experiments of von Hippel et al. [1], yet we found that their conclusions are based on models that cannot account for current-day knowledge of transcription.
In this work, we have investigated the validity of thermodynamic equilibrium models for gene regulation beyond internal consistency by comparing the fitted binding free energy of transcription factors to independent in vitro experiments.We find that the agreement between in vivo and in vitro experiments is quantitative within the error range of the experiments.Our work provides evidence that the quantity that governs transcriptional activity is indeed a true equilibrium binding free energy, and not an effective kinetic parameter.This new evidence confirms what has essentially been (in hindsight) a conjecture: that equilibrium binding can be seen as an informative physical representation of TF action.
This result not only provides significant additional plausibility for thermodynamic models of gene regulation, but also points to a large fraction (being more than roughly one-third of the total genome size) of the nonspecific part of the genome being accessible for transcription factors to bind.

TABLE II .
Binding constants of LacI to the wild-type operators O1, O2, and O3 of the lac operon and to the symmetric operator Oid.
FIG. 1.Comparison of in vivo and in vitro determinations of the binding free energy of the operator sites of the lac operon.The in vivo data was taken from Garcia and Phillips