Strange quark suppression from a simultaneous Monte Carlo analysis of parton distributions and fragmentation functions

We perform the first simultaneous extraction of unpolarized parton distributions and fragmentation functions from a Monte Carlo analysis of inclusive and semi-inclusive deep-inelastic scattering, Drell-Yan lepton-pair production, and single-inclusive $e^+ e^-$ annihilation data. We use data resampling techniques to thoroughly explore the Bayesian posterior distribution of the extracted functions, and use $k$-means clustering on the parameter samples to identify the configurations that give the best description across all reactions. Inclusion of the semi-inclusive data reveals a strong suppression of the strange quark distribution at parton momentum fractions $x \gtrsim 0.01$, in contrast with the ATLAS observation of enhanced strangeness in $W^\pm$ and $Z$ production at the LHC. Our study reveals significant correlations between the strange quark density and the strange $\to$ kaon fragmentation function needed to simultaneously describe semi-inclusive $K^\pm$ production data from COMPASS and inclusive $K^\pm$ spectra in $e^+ e^-$ annihilation from ALEPH and SLD, as well as between the strange and light antiquark densities in the proton.

Resolving the femtoscale structure of the nucleon remains a central mission of ongoing and planned experimental programs at accelerator facilities such as Jefferson Lab, RHIC, COMPASS at CERN, J-PARC, and the future Electron-Ion Collider. In particular, the flavor and spin decomposition of the proton's valence and sea quark densities provides fascinating glimpses into the nonperturbative QCD dynamics that give rise to the rich phenomenology of quark and gluon interactions at long distances. Considerable information has been accumulated from high energy scattering on the proton's u-and dquark parton distribution functions (PDFs) [1][2][3], and more recently on itsū andd content [4][5][6][7]. The quantitative nature of the nonperturbative strange quark sea, on the other hand, has remained obscured from a variety of probes that have attempted to elucidate its structure. This has hampered, for example, the determination of the CKM matrix element V cs , as well as precision determinations of the W -boson mass, which depend on precise knowledge of the strange quark PDF [8,9].
Since the photon couples with equal strength to d and s quarks, it is difficult to disentangle the strange quark properties from the nonstrange using purely inclusive DIS observables, even with proton and neutron targets, without appealing to weak currents to provide independent flavor combinations [10]. The traditional method to determine the strange-quark PDF has been through inclusive charm meson production in charged current neutrino-nucleus DIS. Analyses of the CCFR [11] and NuTeV [12] ν andν cross sections from the Tevatron, and more recently from the CHORUS [13] and NO-MAD [14] experiments at CERN, have yielded a strange to light-antiquark ratio R s = (s +s)/(ū +d) of the order ∼ 0.5. Unfortunately, the interpretation of the neutrinonucleus data suffers from uncertainties in nuclear effects in both the initial and final states: for the former in relating nuclear structure functions to those of free nucleons [15], and for the latter in the treatment of charm quark energy loss and D meson-nucleon interactions during hadronization within the nucleus [16,17].
A method that capitalizes on the unique advantages of weak probes, and at the same time avoids complications due to nuclear effects, is inclusive W ± and Z boson production in pp collisions. Recent data from the ATLAS Collaboration [18,19] at the LHC suggested a surprisingly larger strange quark sea than traditionally obtained from neutrino scattering, with R s ≈ 1.13 at parton momentum fraction x = 0.023 and scale Q 2 = 1.9 GeV 2 . Their latest analysis [19] ("ATLAS-epWZ16") of the W → ν and Z/γ * → data, combined with the HERA runs I and II neutral current and charged current cross sections [20], and assuming s =s, yielded results consistent with the earlier enhancement.
Because the ATLAS-epWZ16 fit [19] uses only HERA and ATLAS data, the light quark sea that emerges has d <ū at x ∼ 0.1, in contrast to the more standard d >ū scenario found from the Fermilab E866 Drell-Yan (DY) experiment [21,22]. In a combined fit to LHC data and charm production from neutrino DIS, Alekhin et al. [9,23,24] argued that the apparent strange quark enhancement was in fact due to the corresponding suppression of thed sea at small x. The ATLAS Z → data were found to disagree with results from CMS [25], which agree with the ABMP16 global QCD analysis [26].
The possible tension between the ATLAS and CMS measurements was investigated in a recent dedicated arXiv:1905.03788v2 [hep-ph] 31 Mar 2020 analysis by Cooper-Sarkar and Wichmann (CSKK) [27], who performed an NNLO fit to the ATLAS and CMS inclusive W ± and Z production data at √ s = 7 and 8 TeV, along with the combined HERA cross sections, using a K-factor approach. The analysis found no significant tension between the HERA, ATLAS and CMS data, and supported an unsuppressed strange PDF at low x. Their standard fit, on the other hand, givesd <ū at x ∼ 0.1, in contradiction with the E866 DY data, although CSKK find that their fit withd forced to be greater thanū reduces R s only by ≈ 10% [27].
From another direction, an independent source of information on the strange-quark PDF at lower energies is semi-inclusive deep-inelastic scattering (SIDIS), in which detection of charged pions or kaons in the final state acts as a flavor tag of the initial state PDFs. Earlier the HERMES Collaboration [28] analyzed K + + K − production data from deuterons, finding a significant rise in the extracted strange PDF at x 0.1 using LO hard coefficients, with a strong suppression at x 0.1. A subsequent analysis [29] using new π and K multiplicity data found a less pronounced rise at small x, but an essentially vanishing strangeness for x > 0.1.
Problems with SIDIS analyses such as that in Ref. [29], which attempt to extract PDF information from a single data set (in this case K production) within an LO framework, were expounded by Stolarski [30], who suggested additional systematic checks of the HERMES analysis with pion production data. Difficulties in describing the HERMES pion multiplicities were also noted by Leader et al. [31,32], who observed that different projections of the 3-dimensional data set (which is a function of the four-momenta of the target, p, virtual photon, q, and produced hadron, p h ) do not give compatible results.
A further strong assumption in Ref. [29] and similar analyses is that the nonstrange PDFs and fragmentation functions (FFs) are sufficiently well known, neglecting possible correlations. It was found in earlier analyses of polarized SIDIS data, however, that assumptions about FFs can lead to significant differences in extracted helicity PDFs [33,34], and that a simultaneous analysis of PDFs and FFs was needed for any definitive conclusion [35]. Aschenauer et al. [36] noted that, while an NLO analysis of semi-inclusive DIS data would be preferred, an LO extraction is an important first step given that "such a procedure using semi-inclusive DIS data is not currently available." Later, Borsa et al. [37] considered the constraining power of SIDIS data on the unpolarized proton PDFs through an iterative reweighting procedure, as a further step towards a full combined global analysis of PDFs and FFs.
In this paper, we undertake such a combined analysis at NLO, taking advantage of recent advances in Bayesian likelihood analysis using Monte Carlo techniques to perform the first global QCD fit that includes SIDIS multiplicities and simultaneously determines unpo-larized PDFs and FFs. Inclusion of the latter in the same global framework is crucial if one is to utilize the SIDIS data without biasing the analysis with ad hoc assumptions about FF parametrizations. An initial attempt at a simultaneous extraction of spin-dependent PDFs and FFs was made in Ref. [38]; however, the unpolarized PDFs there were fixed [39] and spin-averaged SIDIS data were not used in the fit. The present work is the first of its kind to combine the standard DIS and DY data sets used in most global fits [26,[40][41][42][43] to constrain the light-quark PDFs, single-inclusive e + e − annihilation to constrain FFs, and SIDIS multiplicities which are sensitive to both PDFs and FFs. It thus represents an order of magnitude greater challenge than what has ever been attempted before.
For the DIS data sets we include measurements from BCDMS [44], SLAC [45], NMC [46,47] and HERA runs I and II [20]. To apply the standard collinear factorization formalism [48] and avoid power corrections at low energies, we include data that satisfy the cuts on the hadronic final state mass squared W 2 ≡ (p + q) 2 > 10 GeV 2 and four-momentum transfer squared Q 2 > m 2 c , where m c = 1.27 GeV is the charm quark mass. To avoid ambiguities with nuclear effects we do not consider neutrino DIS data, and to clearly isolate the effects of the SIDIS observables on the strange PDF we do not include the high energy LHC data in the present analysis.
While inclusive DIS provides the mainstay observables that constrain the PDF combinations q + ≡ q +q, their ability to isolate sea quark distributions from the valence is rather limited, even with the presence of charged current data from HERA [20]. More direct constraints on the light quark and antiquark sea are provided by the pp and pd DY lepton-pair production data from the Fermilab E866 experiment [21], which involves convolutions of beam and target PDFs sensitive to small and large parton momentum fractions.
Further combinations of PDFs in which quark and antiquark flavors are differentiated can be obtained from SIDIS hadron production reactions, where a hadron h is detected in the final state. In collinear factorization, the cross section for the inclusive production of hadron h is given as a double convolution of the hard scattering cross where z h ≡ p · p h /p · q is the fraction of the virtual photon's momentum carried by h. Data on π ± [49] and K ± production [50] from COMPASS on deuterium are used for 0.2 < z h < 0.8, with the low-z h cut chosen to exclude target fragmentation and the high-z h cut avoids exclusive channels and threshold resummation effects [51,52]. Lower energy SIDIS data from HERMES on hydrogen and deuterium [28,29,53] were also considered. However, questions of compatibility of the [x Bj , z h ] and [Q 2 , z h ] projections of the data [30][31][32] as well as concerns about kinematical mass correction uncertainties [54] at lower Q suggested that effects beyond those included in our present framework may need to be taken into account for a quantitative description. While it is problematic to determine both PDFs and FFs from SIDIS multiplicities alone, more reliable constraints on the FFs can be obtained from hadron production in single-inclusive annihilation (SIA) in e + e − collisions [35,[55][56][57][58][59][60]. As in the previous JAM analysis [35], we consider SIA data from DESY [61][62][63][64], SLAC [65][66][67][68][69], CERN [70][71][72][73][74], and KEK [75] for Q up to ∼ M Z , as well as more recent results from Belle [76,77] and BaBar [78] at Q ≈ 10 GeV.
For the QCD analysis we use hard scattering kernels computed to NLO accuracy in the MS scheme, with the variable flavor number scheme for heavy flavors. As in earlier JAM analyses [35,38,79,80], for the functional form of the distributions we take the standard template, for both PDFs and FFs at the input scale, Q 2 = m 2 c . Typically, one template shape is needed for each nonsinglet and singlet flavor combination. We therefore take one template function for the (nonsinglet) valence u-and d-quark PDFs, which from Regge phenomenology are expected to have a behavior ∼ x −1/2 at low x, and the (singlet) gluon PDF, which is expected to have the more singular ∼ x −1 behavior as x → 0. For the sea quarkd, u, s ands distributions, which are given by combinations of nonsinglet and singlet terms, two shapes are needed: a flavor-symmetric sea-like shape that is dominant at low x, and a valence-like shape that is flavor dependent. For the FFs one template shape was used for each q andq flavor.
As in our previous analyses, we sample the likelihood function by performing multiple χ 2 minimizations that differ by their initial parameters for the gradient search, as well as by the central values of the data which are shifted via data resampling. We use the same χ 2 function as in Refs. [35,38,79,80], which includes correlated systematic uncertainties for each experiment with nuisance parameters treated on the same footing. For the initial analysis we fix the γ and δ shape parameters to zero, giving a total of 52 shape parameters together with 41 nuisance parameters for the systematic uncertainties. However, to explore the possible dependence of our results on the choice of parametrization we also perform fits with γ and δ as free parameters, as we discuss below.
To minimize fitting bias and account for the possibility of multiple minima in the parameter space, we implement Bayesian regression using Monte Carlo methods via data resampling and a comprehensive exploration of parameter space. To this end we devise a multi-step procedure, starting with sampling the posterior distributions for parameters using flat priors for fixed-target DIS data only [44][45][46][47]. The previous step's posterior parameters then become priors for each subsequent step, in which first the DIS data sets are supplemented with the HERA run I and II data [20], followed by the DY pp and pd data [21]. At the next stage we sample the posterior distributions for the FFs using flat priors and SIA data for pions and kaons [61][62][63][64][65][66][67][68][69][70][71][72][73][74][75][76][77][78]. The resulting FF posteriors, together with the PDF posteriors from the previous step, are then fed in a new round where SIDIS pion and kaon data are included along with DIS, DY and SIA.
At this stage we employ a k-means clustering algorithm [81,82] to identify different solutions, and use a sum of reduced χ 2 values per experiment, where N exp is the number of data points in each experiment, as the selection criterion. The use of the quantity χ 2 ensures that the fits provide good descriptions of all data sets, not just those with the most points. To confirm that the final solutions are a faithful representation of the likelihood function in the vicinity of the optimal parameter configuration, we construct flat priors that are confined within the posteriors identified as the best, and then perform a final run. We stress that such an analysis would not have been feasible within a traditional approach with χ 2 minimization, but has become practical within our Monte Carlo strategy.
Our final results are based on a sample of 953 fits to 4,366 data points, giving a mean reduced χ 2 = 1.30 (with individual χ 2 of 1.28 for 2,680 DIS points, 1.25 for 992 SIDIS, 1.67 for 250 DY, and 1.27 for 444 SIA). The resulting PDFs, which we refer to as "JAM19", are illustrated in Fig. 1 at a common scale of Q 2 = 4 GeV 2 . Our results for the nonstrange distributions are generally similar to those obtained by other groups [26,[40][41][42][43]. Some differences appear in the valence u-quark distribution in the region 0.05 x 0.2, where the JAM19 result sits slightly about the others. As discussed below, this appears correlated with the strong suppression of the strange-quark PDF found in our combined analysis of the PDFs and FFs. We note that in the electromagnetic DIS structure functions, which provide the bulk of the constraints on the PDFs in this region, the quark and antiquark distributions enter additively, so that a suppression of the strange PDF will be compensated by a slight enhancement in the valence distributions.
To test the dependence of the valence-quark PDFs on our chosen parametric form, we have also performed a Monte Carlo fit where the polynomial parameters γ and δ in Eq. (2) were allowed to vary. The results indicate that the changes are rather small with the more flexible parametrization, on the scale of the uncertainties, and suggest that our valence PDFs do not depend significantly on whether γ and δ are free parameters or are set to zero. To further explore the flexibility beyond the oneshape scenario with nonzero γ and δ, we also performed fits with two basic template shapes from Eq. (2), with for both the u v and d v distributions. Again, the results were almost indistinguishable from those of our default JAM19 analysis shown in Fig. 1. For thed −ū asymmetry, significant differences exist between our results and the CSKK fit [27], which uses only HERA and LHC results and excludes the fixedtarget DY data [21,22]. The latter force a positive asymmetry peaking at x 0.1, in contrast to the negatived−ū driven by the HERA data. For the gluon distribution at low x the main constraint is from the HERA data.
The most striking result of our analysis is that the strange-quark PDF is significantly reduced compared with that reported by ATLAS [18,19] and the CSKK fit [27]. For the strange to nonstrange ratio, we find R s ≈ 0.2 − 0.3 at x ∼ 0.02, in contrast to values of R s ∼ 1 inferred from the ATLAS data, and closer to those extracted from neutrino experiments. (We note that inclusion of the neutrino-nucleus DIS data, as used by a number of the global QCD analysis groups, would enhance the strange-quark signal up to R s ∼ 0.5, thereby effectively requiring smaller values for the nonstrange or valence quark PDFs to describe the data -see Fig. 1.) The most significant source of the strange suppression is the SIDIS and SIA K production data, as Fig. 2 illustrates. Without these data, the s + PDF is poorly constrained, in contrast to the light flavor sea, which is not strongly affected by the SIDIS multiplicities. Consequently, while the ratio R s varies over a large range without SIDIS (and SIA) data, and at low x is compatible with R s ∼ 1, once those data are included its spread becomes dramatically reduced.
The vital role played by the SIDIS and SIA measurements can be better appreciated from the FFs, shown in Fig. 3, where we compare the full results with those constrained only by SIA data, and with some common FF parametrizations [55,56]. Since the SIA data alone cannot discriminate between q andq fragmentation, we show the FFs for q + → π + , K + . While the pion FFs are generally in better agreement, the kaon FFs display more variation. The χ 2 per datum values from our full fit are 1.07 for the π ± SIA data and 1.48 for the K ± SIA data. For the SIDIS data we find χ 2 /datum of 1.18 for pions and 1.30 for kaons. These values are generally comparable to those found by other groups [55,56,59,60], although in some cases different data are fitted and none of the other analyses performs a simultaneous analysis as we do here.
Our full fits reflect the standard hierarchy of the favored and unfavored fragmentation, with the s + → K + FF larger than the u + → K + , which in turn is larger than the unfavored d + → K + . In contrast, for the SIAonly fits the unfavored D K + d + includes solutions with both soft and hard shapes, the latter being correlated with a smalls → D K + fragmentation. In Ref. [35] the FFs were constrained also by light-flavor tagged SIA data, which are not included here because of potential bias from their reliance on Monte Carlo simulations. Instead, we find that the combination of SIDIS and SIA data forces the favored D K + s to be large at high z, comparable to the DSS fit [55].
found in our combined analysis has major consequences for the strange-quark PDF. Since the K + SIDIS deuterium cross section is given by the flavor combination 2(u+d)D K + u +sD K + s , at moderate x and z 0 it is dominated by the u-quark term. The K − cross section, in contrast, is proportional to the combination 2(ū +d)D K − u + sD K − s , and receives comparable contributions from strange and nonstrange quarks. Because the nonstrange PDFs are much better determined, and the nonstrange favored D K + u = D K − u is well constrained by the SIA and SIDIS data (see Fig. 3), the K + and K − SIDIS multiplicities provide sensitivity to the total strange quark contribution, s D K − s . In practice, the SIDIS data alone admit solutions which have either a relatively small s(x) and large D K − s (z), or a large s(x) and small D K − s (z), as the data/theory ratios in Fig. 4 illustrate. The solutions with the best χ 2 after k-means clustering are illustrated by the red points in Fig. 4, and correspond to the full results (with the small s-quark PDF and large D K − s FF) displayed in Figs. 1-3. The green points in Fig. 4 represent solutions that give equally good descriptions of SIDIS data, but with a large s(x) weighted by a small D K − s (z), which then underestimates the SIA cross sections by ∼ 50% − 100% for large z h values. For example, for the SLD [69] and ALEPH [72] data illustrated in Fig. 4, the best solutions (red) yield an average reduced χ 2 SLD = 1.38 and χ 2 ALEPH = 0.74, but much larger χ 2 values (4.10 and 4.62, respectively) are found for the unfavored solutions (green). Such correlations are symptomatic of the inverse problem for nucleon structure, and our analysis clearly indicates that the path towards its solution must involve simultaneous extraction of all collinear distributions within a single unified framework.
The SIDIS K ± production data could also in principle discriminate between the s ands PDFs, which need not have the same x dependence [83][84][85][86][87][88]. We explored this scenario by parametrizing s(x) ands(x) separately, but found that none of the SIDIS or other data sets showed clear preference for any significant s−s asymmetry within Ratios of data to theory for e + e − → K ± X cross sections versus z h from SLD [69] and ALEPH [72] (top two panels), and for K − production in SIDIS from COMPASS [50] for two representative xBj bins (xBj = 0.02 and 0.12) for the Q 2 ranges indicated (in GeV 2 ). The red points correspond to the average of the best solutions selected by the k-means algorithm, while the green points represent unfavored solutions with smaller D K ± s + (z) and larger s(x).
uncertainties. Future high-precision SIDIS data from Jefferson Lab or from the planned Electron-Ion Collider may allow more stringent determinations of the s and s PDFs [89], as would inclusion of W + charm production data from the LHC [90,91], with better knowledge of c-quark jet fragmentation and hadronization [27].
In addition to collinear distributions, SIDIS data will also provide opportunities in future to study transverse momentum dependent PDFs and FFs, which involve more complicated correlations between the longitudinal and transverse momenta and spins of partons [92]. The methodology developed here for the simultaneous global QCD analysis of different types of distributions will pave the way towards universal analyses of quantum probability distributions that will map out the 3-dimensional structure of the nucleon [93].