Constraining the Higgs valence contribution in the proton

Non-perturbative gauge-invariance under the strong and the weak interactions dictates that the proton contains a non-vanishing valence contribution from the Higgs particle. By introducing an additional parton distribution function (PDF), we investigate the experimental consequences of this prediction. The Herwig 7 event generator and a parametrized CMS detector simulation are used to obtain predictions for a scenario amounting to the LHC Run II data set. We use those to assess the impact of the Higgs PDF on the pp->ttbar process in the single lepton final state. Comparing to nominal simulation we derive expected limits as a function of the shape of the valence Higgs PDF. We also investigate the process pp->ttZ at the parton level to add further constraints.


Introduction
In gauge theories, physical states must be gauge-invariant. This simple requirement is implemented perturbatively by the BRST symmetry [1,2].
One consequence of these considerations is that, in addition to the perturbative Higgs sea contribution [19][20][21], the operator describing the SM proton must necessarily also have a valence Higgs component [17]. If the center-of-mass energy exceeds the Higgs boson mass, proton-proton collider phenomenology can be affected [14,17]. Provided the valence contribution is large enough, a discovery of the effect might be possible at the LHC. At the FCC-hh, any effect of the valence Higgs contribution will be greatly amplified 1 . We discuss the details in Section 2 To explore such a possible contribution, we use an ansatz for the valence Higgs PDF and study its impact on the pp → tt and pp → ttZ processes at the parton level. As detailed in Section 3, this is done by simulating the process with Herwig 7 [22] at the level of the full cross section.
For the process pp → tt, we also investigate a scenario amounting to the LHC Run-II data set. To this end, Section 4 describes the processing of events with the Delphes [23] software package that parametrizes the reconstruction performance of, in our case, the CMS detector [24]. We obtain predictions for total cross sections and various differential distributions of discriminating observables at the detector level. Including sensible estimates of the experimental uncertainties, we derive expected upper limits at 95% CL of typically O(10 −4 ) on the total Higgs valence contribution (Section 5). While the shapes of the Higgs valence PDFs considered in this work only give an indication of potential effects, our results encourage global fits of proton PDF sets including a more general form of the valence Higgs contribution.
A small set of preliminary results has already been made available in Ref. [25]. 1 We note that the valence Higgs implies that only full weak multiplets are present in the proton, which can potentially strongly affect also the sea contribution at FCC-hh [19][20][21]. 2 2 The structure of the proton The requirement of manifest gauge invariance in a non-Abelian gauge theory necessitates composite operators [7,8]. See for a detailed review [14], including an overview over support for this and the following from lattice calculations.
For the SM, this leads only to minor differences compared to perturbative calculations [1]. This can be understood as following [14]. Consider the observed 125-GeV scalar, viz. what is usually called the Higgs. This needs now to be described by a composite operator. A possible composite operator would be where φ(x) is the full Higgs field, not the fluctuation field where v is the vacuum expectation value. The particle described by O 0 is thus a singlet in the fundamental representation of the weak interaction. At the same time an operator like O 0 has the same structure as an operator describing a hadron in QCD. It thus describes a bound state, and would, in principle, require a non-perturbative treatment. However, the particularities of the Brout-Englert-Higgs effect allows for the treatment of (1) in an analytical manner, the Fröhlich-Morchio-Strocchi (FMS) mechanism [7,8]. In a fixed 't Hooft-gauge expand φ = v + H(x) in the correlator O † 0 O 0 , and take the connected part only. This yields In perturbation theory, each of the terms is individually physical, i. e. describes a physical initial and final state. Non-perturbatively, this is spoiled by the Gribov-Singer ambiguity, and only the sum is physical despite the individual perturbative BRST invariance of the terms [14]. Standard perturbation theory is equivalent to keeping the term (3), and expanding it into the perturbative series, while (4) is discarded. Viewing that as an expansion in both, v and the usual perturbative series, implies that, to leading order in v, the propagator of the bound state coincides with the one in perturbation theory to all orders in the other couplings. Because the poles coincide, the bound state therefore has the same mass and properties as the elementary Higgs. Because the contributions (4) are small and the remainder of the SM works in a very similar way [7,8,14], the approximation is excellent and explains why standard perturbative calculations in the SM are doing so well at LHC.
However, this does not imply that the remaining terms (4) must be immeasurably small. First investigations on and off the lattice indeed hint that the contributions can become relevant in certain kinematic situations [17,18,25], though usually energy scales of order v or the Higgs mass are required. This suggests to look at the proton as initial state particle at the LHC, as we do in the following.
Because of the representation theory of SU(3) and SU (2), there is no way to create a baryon state with only quarks in the SM, that is simultaneously non-perturbatively gauge-invariant under both the weak and the strong interactions [17]. An additional contribution, a singlet under the strong interaction but not the weak interaction, is necessary to create such a state. Because the J P quantum number of the proton should be left unchanged, the Higgs field is the only possibility. Thus, symbolically 2 , a proton operator reads Note that this state does no longer carry isospin as 'up' or 'down'. This would-be flavor, i. e. being proton or neutron, is the weak gauge charge, and fully contracted 3 with the quarks. Its role is taken over by the custodial charge of the Higgs [14,17]. The explicit breaking of the custodial symmetry in the SM creates the differences between neutrons and protons, including their mass difference, up to electromagnetic corrections. At leading order in the FMS mechanism p ≈ vqqq, and thus ordinary QCD arises. Of course, the operator qqq still needs to be treated nonperturbatively within QCD. However, just as in (2)(3)(4), additional terms arise beyond leading order. In particular, in any scattering process (see [14]) where X contains all remaining operators. Beyond the leading contribution, additional terms arise with an explicit Higgs contribution in the initial state. Because the initial state is now involved, an explicit perturbative calculation, which may be possible for a lepton collider, is probably too complicated within the foreseeable future. Thus, the additional effects will be parametrized here, following [17], by an additional valence PDF for the Higgs.
It should be remarked that in many BSM scenario, the same considerations can lead to qualitatively different effects, rather than the small quantitative ones discussed here. Especially, this can lead to qualitatively different spectra and cross sections. See [14] for an overview. This strongly motivates to search for the corresponding SM effects to understand the relevance for model building of the underlying field-theoretical considerations.

PDF input 3.1 The PDF structure
This leaves to choose the Higgs PDF. As the aim is here a first exploration, we will neglect DGLAP evolution and Q 2 dependence, and work with a simple x-dependent PDF. The DGLAP evolution is possible by choosing input valence PDFs and the corresponding evolution equations, as was done for the sea distributions in [19][20][21]. We defer this problem to future work, and this is a necessary next step.
Even though the Higgs is now considered a valence particle, it is expected that some of the condensate features remain. Thus, we expect that the Higgs PDF should be concentrated, to leading order, around x ≈ 0, similar to the presumed color glass condensate of gluons. This is also in line with the considerations of [17], where the situation at lepton colliders was investigated.
Thus, we consider a Higgs PDF of the general form The parameter c 0 describes the suppression of a valence Higgs carrying all the energy of the proton. The parameter c 5 allows for a valence peak of the Higgs in the PDF, while c 3 and c 4 are the width around the condensate and the valence particle peak, respectively. The parameters c 6 and c 7 allows for balancing between condensate and valence particle contributions. The parameter c 1 finally allows for an enhancement for low-energy valence Higgs. However, as we only consider on-shell PDFs, there will be no contribution to a process if the Higgs cannot come on-shell. Thus, for a Higgs to come on-shell requires at least x 0.02 at the LHC with a proton energy of 7 TeV. A summary of the optimal Higgs valence PDF parameters obtained in section 3.3 is provided as an illustration in Table 1.
Because the Higgs does not carry quantum numbers, the only sum-rule the Higgs PDF influences is the momentum one [17]. Including the Higgs leads to a modification of the sum-rule from  (6)  with an explicitly calculable function f . The parameter c can be viewed as the absolute normalisation of the Higgs PDF P H , and describes the 'fraction' of the proton made up by the Higgs. The c f depend on the parton flavor, necessitating in principle a global PDF refit. Moreover, the valence Higgs can affect the shape of quark and gluon PDFs as well, or modify their Q 2dependence. At the lowest order considered here, however, the gluon will contribute about 90% of the standard contributions, and thus only c g plays a substantial role. As the Higgs is assumed to form a condensate-like structure while the gluon dominates the QCD-partons at low x, a suitable assumption for now will be c f = c g = c. With the usual PDFs fixed, we will thus not be able to perfectly maintain the sum rule, but its violation for reasonable parameter ranges is small. In the following, we consider primarily the process pp → tt, because the top quark has a Yukuwa coupling y t ≈ 1. The inclusive cross section for this process can be written as where the c-independent σ pp→tt contains all ordinary considered contributions in the initial state, i. e. quarks and gluons. Initial state Higgs bosons are excluded in this term. The experimental uncertainty of 3% [26,27] will be used in Section 3.3 to provide a first constrain of c and the parameters c i in (6). Furthermore, the kinematics of processes with initial state Higgs and QCD partons differ, and therefore we proceed to differential cross sections, determined as with σ pp→tt (c) given by (7) above.
In addition, we will investigate the process pp → ttZ at the level of the full cross section (7) to constrain the shape parameters c i in (6) further.

Hard process
To simulate the process, we use the Herwig 7 event generator [22,28,29] using the Matchbox module [30] and the angular ordered shower [31], with matrix elements provided by a combination of MadGraph5 aMCatNLO [32] and ColorFull [33].
Since a massive Higgs inside the proton cannot be strictly collinear to the incoming proton's momentum, we allow the Higgs to be space-like offshell. To be precise, for a Higgs momentum p = (p 0 , q ⊥ , p 3 ) inside a proton of momentum P we consider p 2 = −µ 2 < 0, with a longitudinal momentum fraction x = (p 0 + p 3 )/(P 0 + P 3 ), the Higgs' transverse momentum relates to its virtuality as The momentum fraction is then limited in between and we distribute the kinematic variables according to with a uniform azimuthal orientation φ of the Higgs transverse momentum, and an upper bound by the hadronic center of mass energy S is implied on the transverse momentum scale µ 2 . Notice that in the limit m 2 H , m 2 p S we in fact recover a collinear Higgs momentum across the entire range of longitudinal momentum fraction x, as the transverse momenta µ 2 are then We separately show the parton induced production ('qq/gg') and then the cross sections from the gluon-Higgs (gH) and Higgs-Higgs (HH) induced sub-processes. Notice that the gH and Hg sub-processes are mirror symmetric in the rapidity distribution, but identical in the p ⊥ distribution.
peaked at zero. The proton's momentum after the Higgs has been extracted is simply given by momentum conservation, P = P − p and can be shown to satisfy (P ) 2 = m 2 p from the kinematics chosen above. For the baseline of a standard proton as well as for the QCD parton induced contributions we use the MMHT2014nlo68cl PDFs [34] for quarks and gluons. The processes involving QCD partons are simulated at NLO with initial statesqq and gg. As initial states with Higgs content we add gH and HH. However, at this time NLO calculations with initial-state Higgs are not supported by Herwig, and we thus evaluate these processes only at LO. The results are either evaluated directly at the partonic level, or showered for input into Delphes using hepmc files. The latter is again done at NLO for the QCD parton processes and at LO for processes with initial state Higgs. The parton showering, as well as the decays of the top quarks, are then performed with default settings, and in the case of the Higgs induced sub-processes no multi-parton interactions take place.
As an example of the impact of the Higgs-induced sub-processes we show their impact on typical reconstructed top quark observables in Figure 1, just at parton level for the fixed-order hard process.

Selection of the PDF from the total cross section
The Higgs PDF to be used for an input into Delphes was selected by identifying the PDF which allowed for a maximal value of c in (7) without yielding a result exceeding the experimental error margin. Hence, we require that where e tt l = 0.97, e tt u = 1.03 [27]. In addition, we used the process pp → ttZ with the corresponding experimental corridor e ttZ l = 0.87, and e ttZ u = 1.13 [35] . We simulated sufficient events to make the statistical errors of Herwig negligible.
This procedure is illustrated in Figure 2. Depending on the tuning parameters, vastly different values for c are possible. In fact, as the tuning parameter is varied, the cross section changes continuously from exceeding the experimental error bar to undershooting it. Hence, a sweet spot 4 exists for which any value of c is consistent. Depending on the shape of the PDF, this sweet spot for both processes can be moved closer and closer together by varying the tuning parameter, though with none of the PDF shapes we used this was possible simultaneously for both processes. Thus, both processes show a very different sensitivity to c, making them both complementary as constraints. Finally, the relative contributions from gH and HH initial states swap between both processes, with larger contributions from gH for the tt final state and from HH for ttZ.
The resulting constraints on c are shown in Figure 3 for five different versions of the PDF. It is visible that both processes yield quite different constraints, marking them as complementary tests. Generically, the process pp → ttZ is less constrained by the high-x behavior, i. e. at tuning parameters where the PDFs becomes narrower around small x. The process pp → tt is especially less sensitive to distributions extending to rather large values, but are sensitive to very narrow or extremely broad distributions. The reason is that to have a large Higgs content requires that the cross section coming from the Higgs initial states is not too different than the one from quarks and gluons. This originates from the fact that the contribution from the Higgs initial states for the tt final state generically decrease with increasing tuning parameter, starting at the smallest values with cross sections from different initial states much larger than the one from quarks and gluons, but being smaller at the largest value of the tuning parameter. Thus, there is a sweet spot at an intermediate value. For the ttZ final state the situation is similar, but here the HH initial state completely dominates over the gH initial state, and thus the quadratic dependency on the Higgs content drives the effect. This is due to the possibility that the Z is emitted in the initial state only.
Alongside this in Figure 3 also the 'optimal' PDFs are shown, i. e. the ones which allows for a maximal contribution of the Higgs to the proton. The optimal choices for the parameters yield relatively similar PDFs, with values around 5 × 10 −2 for the x-range 0.1 to roughly 0.3, i. e. in a typical range for a valence particle. The following will now concentrate on PDF 2, which allows for the largest Higgs content, though PDF 3 yields quite similar results.

Signatures in the final state at the partonic level
As will become clear in section 4, the differential cross sections at the detector level will further reduce the maximum Higgs content of the proton substantially. It is worthwhile to investigate the origin of this effect at the level of the not-hadronized final state. While many different partial cross sections will be used later in section 4, there is a generic behavior. This is illustrated in Figure 4, showing the differential cross section for the p T spectrum of the top.
It is visible that the spectrum of the tops gets harder when the initial state contains a Higgs in the initial state. Especially with two Higgs in the initial state the spectrum is even harder than with one Higgs in the initial state. Hence, no matter how large the Higgs content, at sufficiently high transverse momentum (p T ) of the top eventually a deviation will arise. Thus, as experiments grow more sensitive at large values of p T , the Higgs content is further reduced, if no enhancement is observed. In fact, the corresponding experimentally observed spectrum [36] currently shows even a suppression compared to the expectation from the case without valence Higgs, creating an even stronger restriction of the Higgs content. This feature seems to be essentially generic within the set of PDFs we have investigated. It may be that different shapes, Q 2 evolution, or a complete refit of all PDFs may change this behavior. If not, then an unambiguous signal for the Higgs content of the proton would be a hardening of the top spectrum in this process.

Experimental constraints
In order to estimate current bounds from experiment we perform event reconstruction with Delphes [23], where we use the CMS reconstruction efficiency parametrisation for the LHC Run II. The tt process in the single lepton final state allows a relatively pure signal selection and eschews the lower branching ratio of the W bosons in the dilepton final states. Jets are reconstructed with the FastJet package [37] and with the anti-k T algorithm [38] with a cone size of R = 0.4. As described before, we simulate the tt process including the diagrams with an initial state Higgs boson with Herwig. Besides this process, we also generate the main backgrounds in the leptonic final states in order to achieve a realistic background prediction. Important backgrounds to the tt process include the W+jets process where we include up to three extra partons, and tW production. We also simulate other small contributions from the ttZ, ttW, tZ, tWZ, WZ and Zγ processes whose yield are mostly negligible. The backgrounds are generated at the parton level events at LO using MadGraph5 amc@nlo v2.3.3 [32], and decayed using MadSpin [39,40]. Parton showering and hadronisation for the backgrounds are performed with pythia 8.2 [41,42]. The W+jets sample is simulated at leading order in perturbative QCD, while all other background processes are simulated at next-to-leading order with MadGraph5 amc@nlo. The statistical uncertainties on the background contributions are negligible in all cases. For validation purposes, we simulate the tt process excluding the valence Higgs distribution following the procedure of background simulation. We find very good agreement of all relevant kinematic distributions between the Herwig 7 and the MadGraph5 amc@nlo simulation. The generated samples are summarized in Table 2.
After event reconstruction, we require exactly one lepton (e or µ) satisfying a threshold of p T (l) > 30 GeV and |η(l)| < 2.4 and a tight isolation criterion. Furthermore, at least 4 jets with p T (j) > 30 GeV and |η(j)| < 2.5, where at least one of the jets has to be identified as a b-tag jet according to the Delphes specification, are required. We remove reconstructed leptons within a cone of ∆R < 0.3 of any reconstructed jet satisfying p T > 30 GeV. These object definitions and the event selection criteria are mildly tuned to reproduce simulated distributions in Ref. [43].
In order to assess the sensitivity of a differential cross section measurement to the Higgs valence contributions, we tested several observables for their discriminative power. The most promising candidates were found to be simple measures of the total momentum scale of the event. Angular observables and dimensionless ratios are less discriminative. In Figure 5 we show the spectrum of missing energy (E miss T ) and the p T of the leading jet in the event. We defer refinements of such observables to later work and for now construct a simple rectangular grid of signal regions in terms of these two observables. The thresholds for E miss T are chosen as 0, 100, 200, and Table 2: Simulated background processes and event counts. Here, l ± = e ± , µ ± , τ ± and ν l = ν e , ν µ , ν τ . The tt process is simulated to validate the Herwig 7 signal simulation.

process
N events cross section tt pp → tt 10 7 831.76 pb W+jets pp → W + jets, W → lν l 10 6 61.5 nb tW pp → t l −ν l + pp → t l + ν l 10 6 19.55 pb tZ pp → t l + l − j + pp → t l + l − j 10 6 0.0758 pb ttZ pp → tt l + l − 10 6 0.0915 pb ttW pp → tt l −ν l + pp → tt l + ν l 10 6 0.2043 pb tWZ pp → t W l + l − + pp → t W l + l − 10 6 0.01123 pb WZ pp → l + ν l l + l − + pp → l −ν l l + l − 10 6 4.666 pb Zγ pp → l + l − γ + pp → l + l − γ j 10 6 131.3 pb 300 GeV and for the leading jet p T as 0, 100, 200, and 400 GeV. This coarse binning ensures that each of the resulting 16 signal regions is still populated by the signal and background simulation and, moreover, that the precise values of the bin boundaries have no significant impact on the result. Besides that, no optimisation is attempted. Effects of the Higgs PDF on processes other than tt are neglected. The predicted yields in these regions are estimated for the 136.6 fb −1 LHC Run-II scenario at √ s = 13 TeV. We consider a ball-park mock up of experimental uncertainties that attempt to reflect the uncertainties in a future experimental measurement. The sources of systematic uncertainties we consider are the jet energy scale (≤ 5%), the rate of failure to identify a b-jet (≤ 2%), the b-tagging rate of light flavor jets (≤ 1%), the lepton identification (≤ 1%) and the luminosity measurement (2.6%). The number in parenthesis are typical upper bounds for the uncertainties in signal and background yields and do not strongly vary across the signal regions. They are obtained by a suitable reweighting of the simulated response and tagging efficiencies. Again, the results do not depend significantly on the details of this scenario.
These uncertainties are associated with log-normal nuisance parameters θ which are used to construct a likelihood function L(θ) where θ labels the set of nuisance parameters. We perform a profiled maximum likelihood fit of L(θ) and consider q(c) = −2 log(L(θ c )/L(θ 0 )), whereθ c andθ 0 are the set of nuisance parameters maximising the likelihood function at a fixed value of c and for c = 0, respectively.   Figure 6: The left-hand side shows the upper limits at 68% and 95% confidence level as a function of the tuning parameter from the analysis with Delphes for Higgs PDF 2 at c 3 = 35. The right-panel shows the resulting optimal Higgs PDF overall size as a function of the Higgs PDF 2 tuning value.  Figure 7: The optimal Higgs PDF 2 at tuning parameter c 3 = 100 and maximal size in comparison to the other PDFs in the employed MMHT'14 set at a Q 2 roughly corresponding to the Higgs mass.

Results
The procedure leads to upper limits of the Higgs content of the proton as a function of the tuning parameters. As an example, Figure 6 (left) shows q(c) for a tuning parameter of 35 with the 68% and the 95% confidence levels (CL) indicated. In Figure 6 (right), we show the 68% and 95% CL as a function of the tuning parameter. Values of c in excess of 2.5 × 10 −4 are excluded at 95% CL for the considered tuning parameter range, with the strongest constraints from the top quarks at high transverse momentum. The importance of highly energetic tails is consistent with the findings in Section 3.4, and in fact suggests that constraints at future facilities such as the HL-LHC scale favorably with luminosity. Because the contribution from the HH initial state is proportional to c 2 , it is negligible at or below the obtained expected limits. While no attempt was made to obtain observed limits, it should be noted that there currently is no significant excess in the tails of kinematic spectra of top quarks or the tt decay products [36,43], and therefore we expect that future studies will find consistent observed limits.

Discussion
The results show that even with the rather crude Higgs PDF employed here a valence Higgs contribution to the proton, as demanded by field theory, is not excluded. The contribution is substantially smaller than that of even the bottom quark, as is shown in Figure 7. However, this is expected as the PDFs get the more stronger suppressed the heavier the particle. This motivates that the Higgs valence contribution in the proton could indeed be there.
Of course, this remains a very crude estimate. Still, this appears sufficient motivation to consider the next logical step: A global refit including a Higgs PDF, as well as a Q 2 evolution of the Higgs PDF. While the LHC just touches upon the relevant energy range, future colliders should be able to probe, and constraint, the Higgs PDF further, especially the HL-LHC.
While the results here certainly encourage the compatibility of a Higgs PDF with the data, this opens two important questions: If a global refit does not yield an essentially vanishing Higgs PDF, in contrast to the results here, then it appears there is an ambiguity in fitting the data with PDFs. It would imply that an additional PDF could have been covered by the other PDFs so far. That appears somewhat unlikely given the considerations in [44]. A prerequisite to such a study is to formalize factorization theorems involving electroweak particles in the initial state, something which we have only made a phenomenologically reasonable assumption on. Such a treatment should clarify how the actual amplitudes can be evaluated and how NLO QCD corrections can be included for the Higgs-induced processes.
On the other hand, taking the perspective that the field theoretical motivation is solid for the presence of the Higgs PDF, this implies that if the data cannot be consistently fitted with the additional Higgs PDF, this could already imply new physics. After all, the additional Higgs component adds further leading-order interaction channels, which had been absent so far. An example is the aforementioned too soft top spectrum [36]. The additional Higgs component makes it even harder to make this consistent with the SM, and thus enlarges the discrepancy of the measurement to the SM.
In addition, by choosing suitable kinematic cuts to prefer initial state Higgs, this may also extend the reach for new physics searches, if they couple dominantly to the Higgs, e. g. dark matter through the Higgs portal.