Search for $A'\!\to\!\mu^+\mu^-$ decays

Searches are performed for both prompt-like and long-lived dark photons, $A'$, produced in proton-proton collisions at a center-of-mass energy of 13 TeV. These searches look for $A'\!\to\!\mu^+\mu^-$ decays using a data sample corresponding to an integrated luminosity of 5.5/fb collected with the LHCb detector. Neither search finds evidence for a signal, and 90% confidence-level exclusion limits are placed on the $\gamma$-$A'$ kinetic-mixing strength. The prompt-like $A'$ search explores the mass region from near the dimuon threshold up to 70 GeV, and places the most stringent constraints to date on dark photons with $214<m(A') \lesssim 740$ MeV and $10.6<m(A') \lesssim 30$ GeV. The search for long-lived $A'\!\to\!\mu^+\mu^-$ decays places world-leading constraints on low-mass dark photons with lifetimes $\mathcal{O}(1)$ ps.

small ±∆m window around m(A ), n γ * ob [m(A )], by [74] n A ex [m(A ), where the dark-photon lifetime, τ (A ), is a known function of m(A ) and ε 2 , F is a known m(A )-dependent function, and A γ * [m(A ), τ (A )] is the τ (A )-dependent ratio of the A → µ + µ − and γ * → µ + µ − detection efficiencies. For prompt-like dark photons, A → µ + µ − decays are experimentally indistinguishable from prompt γ * → µ + µ − decays, resulting in A γ * [m(A ), τ (A )] ≈ 1. This facilitates a fully data-driven search where most experimental systematic effects cancel, since the observed A → µ + µ − yields, n A ob [m(A )], can be normalized to n A ex [m(A ), ε 2 ] to obtain constraints on ε 2 without any knowledge of the detector efficiency or luminosity. When τ (A ) is larger than the detector decaytime resolution, A → µ + µ − decays can potentially be reconstructed as displaced from the primary pp vertex (PV) resulting in A γ * [m(A ), τ (A )] = 1; however, only the τ (A ) dependence of the detection efficiency is required to use Eq. (1). Finally, Eq. (1) is altered for large m(A ) to account for additional kinetic mixing with the Z boson [82,83].
The LHCb detector is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, described in detail in Refs. [84,85]. The prompt-like A search is based on a data sample that employs a novel data-storage strategy, made possible by advances in the LHCb data-taking scheme introduced in 2015 [86,87], where all online-reconstructed particles are stored, but most lower-level information is discarded, greatly reducing the event size. In contrast, the data sample used in the long-lived A search is derived from the standard LHCb data stream. Simulated data samples, which are used to validate the analysis, are produced using the software described in Refs. [88][89][90].
The online event selection is performed by a trigger [91] consisting of a hardware stage using information from the calorimeter and muon systems, followed by a software stage that performs a full event reconstruction. At the hardware stage, events are required to have a muon with momentum transverse to the beam direction p T (µ) 1.8 GeV, or a dimuon pair with p T (µ + )p T (µ − ) (1.5 GeV) 2 . The long-lived A search also uses events selected at the hardware stage independently of the A → µ + µ − candidate. In the software stage, where the p T resolution is substantially improved cf. the hardware stage, A → µ + µ − candidates are built from two oppositely charged tracks that form a good-quality vertex and satisfy stringent muon-identification criteria, though these criteria were loosened considerably in the low-mass region during 2017-2018 data taking. Both searches require p T (A ) > 1 GeV and 2 < η(µ) < 4.5. The prompt-like A search uses muons that are consistent with originating from the PV, with p T (µ) > 1.0 GeV and momentum p(µ) > 20 GeV in 2016, and p T (µ) > 0.5 GeV, p(µ) > 10 GeV, and p T (µ + )p T (µ − ) > (1.0 GeV) 2 in 2017-2018. The long-lived A search uses muons that are inconsistent with originating from any PV with p T (µ) > 0.5 GeV and p(µ) > 10 GeV, and requires 2 < η(A ) < 4.5 and a decay topology consistent with a dark photon originating from a PV.
The prompt-like A sample is contaminated by prompt γ * → µ + µ − production, various resonant decays to µ + µ − , whose mass-peak regions are avoided in the search, and by the following types of misreconstruction: (hh) two prompt hadrons misidentified as muons; (hµ Q ) a misidentified prompt hadron combined with a muon produced in the decay of a heavy-flavor quark, Q, that is misidentified as prompt; and (µ Q µ Q ) two muons produced in Q-hadron decays that are both misidentified as prompt. The impact of the γ * → µ + µ − background is reduced, cf. Ref. [81], by constraining the muons to originate from the PV when determining m(µ + µ − ), which improves the resolution, σ[m(µ + µ − )], by about a factor of 2 for small m(A ). The misreconstructed backgrounds are highly suppressed by the stringent muon-identification and prompt-like requirements applied in the trigger; however, substantial contributions remain for m(A ) 1.1 GeV. In this mass region, dark photons are expected to be predominantly produced in Drell-Yan processes, from which they would inherit the well-known signature of dimuon pairs that are largely isolated. Therefore, the signal sensitivity is enhanced by applying the anti-k T -based [92][93][94] isolation requirement described in Refs. [81,95] for m(A ) > 1.1 GeV.
The observed prompt-like A → µ + µ − yields, which are determined from fits to the m(µ + µ − ) spectrum, are normalized using Eq. (1) to obtain constraints on ε 2 . The n γ * ob [m(A )] values in Eq. (1) are obtained from binned extended maximum likelihood fits to the min[χ 2 IP (µ ± )] distributions, where χ 2 IP (µ) is defined as the difference in the vertex-fit χ 2 when the PV is reconstructed with and without the muon. The min[χ 2 IP (µ ± )] distribution provides excellent discrimination between prompt muons and the displaced muons that constitute the µ Q µ Q background. Since χ 2 IP (µ) approximately follows a χ 2 probability density function (PDF), with two degrees of freedom, the min[χ 2 IP (µ ± )] distributions have minimal mass dependence for each source of dimuon candidates. The prompt-dimuon PDFs are taken directly from data at m(J/ψ ) and m(Z), where prompt resonances are dominant. Small corrections are applied to obtain these PDFs at all other m(A ), which are validated near threshold, at m(φ), and at m[Υ (1S)], where the data predominantly consist of prompt dimuon pairs. Based on these validation studies, a small shape uncertainty is applied in each min[χ 2 IP (µ ± )] bin. Same-sign µ ± µ ± candidates provide estimates for the PDF and yield of the sum of the hh and hµ Q contributions, where each involves misidentified prompt hadrons. The µ ± µ ± yields are corrected to account for the difference in the production rates of π + π − and π ± π ± , since the hh background largely consists of π + π − pairs where both pions are misidentified. The uncertainty due to the finite size of the µ ± µ ± sample in each bin is included in the likelihood. Simulated Q-hadron decays are used to obtain the µ Q µ Q PDFs, where the dominant uncertainties are from the relative importance of the various Q-hadron decay contributions at each mass. Example min[χ 2 IP (µ ± )] fits, and the resulting prompt-like candidate categorization versus m(µ + µ − ), are provided in Ref. [95]. Finally, the n γ * ob [m(A )] yields are corrected for bin migration due to bremsstrahlung, which is negligible except near the low-mass tails of the J/ψ and Υ (1S), and the small expected Bethe-Heitler contribution is subtracted [74], resulting in the n A ex [m(A ), ε 2 ] values shown in Fig. 1. The prompt-like mass spectrum is scanned in steps of σ[m(µ + µ − )]/2 searching for A → µ + µ − contributions [95], using the strategy from Ref. [81]. At each mass, a binned extended maximum likelihood fit is performed in a ±12.5 σ[m(µ + µ − )] window around m(A ). The profile likelihood is used to determine the p-value and the upper limit at 90% confidence level (CL) on n A ob [m(A )]. The signal mass resolution is determined with 10% precision using a combination of simulated A → µ + µ − decays and the observed p T -dependent widths of the large resonance peaks in the data. The method of Ref. [96] selects the background model from a large set of potential components, which includes all Legendre modes up to tenth order and dedicated terms for known resonances, by performing a data-driven process whose uncertainty is included in the profile likelihood following Ref. [97]. No significant excess is found in the prompt-like m(A ) spectrum, after accounting for the trials factor due to the number of signal hypotheses. Dark photons are excluded at 90% CL where the upper limit on Figure 2 shows that the constraints placed on prompt-like dark photons are the most stringent for 214 < m(A ) 740 MeV and 10.6 < m(A ) 30 GeV. The low-mass constraints are the strongest placed by a prompt-like A search at any m(A ). These results are corrected for inefficiency that arises due to τ (A ) no longer being negligible at such small values of 2 . The high-mass constraints are adjusted to account for additional kinetic mixing with the Z boson [82,83], which alters Eq. (1). Since the LHCb detector response is independent of which Drell-Yan process produces the dark photon above 10 GeV, it is straightforward to recast the results in Fig. 2 for other models [98].
For the long-lived A search, contamination from prompt particles is negligible due to the stringent criteria applied in the trigger. Therefore, the dominant background contributions are: photons that convert into µ + µ − in the silicon-strip vertex detector that surrounds the pp interaction region, known as the VELO [100]; b-hadron decay chains that produce two muons; and the low-mass tail from K 0 S → π + π − decays, where both pions are misidentified as muons. A p-value is assigned to the photon-conversion hypothesis for each long-lived A → µ + µ − candidate using properties of the decay vertex and muon tracks, along with a high-precision three-dimensional material map produced from a data sample of secondary hadronic interactions [101]. A m(A )-dependent requirement is applied to these p-values to reduce conversions to a negligible level. The remaining backgrounds are highly suppressed by the decay topology requirement applied in the trigger. Furthermore, since muons produced in b-hadron decays are often accompanied by additional displaced tracks, events are rejected if they are selected by the inclusive heavy-flavor software trigger [102,103] independently of the presence of the A → µ + µ − candidate. In addition, boosted decision tree classifiers are used to reject events containing tracks consistent with originating from the same b-hadron decay as the signal muon candidates [104]. The long-lived A search is also normalized using Eq. (1); however, A γ * [m(A ), τ (A )] is not unity, in part because the efficiency depends on the decay time, t. The kinematics are identical for A → µ + µ − and prompt γ * → µ + µ − decays for m(A ) = m(γ * ); therefore, the t dependence of A γ * [m(A ), τ (A )] is obtained by resampling prompt γ * → µ + µ − candidates as long-lived A → µ + µ − decays, where all t-dependent properties, e.g. min[χ 2 IP (µ ± )], are recalculated based on the resampled decay-vertex locations (the impact of background contamination in the prompt γ * → µ + µ − sample is negligible). This approach is validated using simulation, where prompt A → µ + µ − decays are used to predict the properties of long-lived A → µ + µ − decays. The relative uncertainty on A γ * [m(A ), τ (A )] is estimated to be 5%, which arises largely due to limited knowledge of how radiation damage affects the performance of the VELO as a function of the distance from the pp interaction region. The looser kinematic, muon-identification, and hardware-trigger requirements applied to long-lived A → µ + µ − candidates, cf. prompt-like candidates, also increase the efficiency. This t-independent increase in efficiency is determined using a control data sample of dimuon candidates consistent with originating from the PV, but otherwise satisfying the long-lived criteria. The n A ex [m(A ), ε 2 ] values obtained using these data- values, along with the expected prompt-like A → µ + µ − yields in Fig. 1, are shown in Fig. 3.
The long-lived m(A ) spectrum is also scanned in discrete steps of σ[m(µ + µ − )]/2 looking for A → µ + µ − contributions [95]; however, discrete steps in τ (A ) are also considered here. Binned extended maximum likelihood fits are performed to the three-dimensional feature space of m(µ + µ − ), t, and the consistency of the decay topology as quantified in the decay-fit χ 2 DF , which has three degrees of freedom. The photon-conversion contribution is derived in each [m(µ + µ − ), t, χ 2 DF ] bin from the number of dimuon candidates that are rejected by the conversion criterion. Both the b-hadron and K 0 S contributions are modeled in each [t, χ 2 DF ] bin by second-order polynomials of the energy released in the decay, m(µ + µ − ) 2 − 4m(µ) 2 . These contributions are validated using the following large control data samples: candidates that fail the b-hadron suppression requirements; and candidates that fail, but nearly satisfy, the stringent muon-identification requirements. The profile likelihood is used to obtain the p-values and confidence intervals on n A ob [m(A ), τ (A )]. No significant excess is observed in the long-lived A → µ + µ − search (the three-dimensional data distribution and the background-only pull distributions are provided in Ref. [95]).
Since the relationship between τ (A ) and ε 2 is known at each mass [74], the upper limits on n  Figure 4 shows that sizable regions of [m(A ), ε 2 ] parameter space are excluded, which are much larger than those excluded by LHCb in Ref. [81]. Furthermore, most of the parameter space shown in Fig. 4 would have been accessible if the data sample was roughly three times larger. The expected number of recorded A → µ + µ − decays should increase by a factor O(100) in the data sample to be collected in Run 3 by the upgraded LHCb detector.
In summary, searches are performed for prompt-like and long-lived dark photons produced in pp collisions at a center-of-mass energy of 13 TeV. Both searches look for A → µ + µ − decays using a data sample corresponding to an integrated luminosity of 5.5 fb −1 collected with the LHCb detector during 2016-2018. The three-fold increase in integrated luminosity, improved trigger efficiency during 2017-2018 data taking, and improvements in the analysis result in the searches presented in this Letter achieving much better sensitivity to dark photons than the previous LHCb results [81]. The promptlike A search achieves a factor of 5 (2) better sensitivity to ε 2 at low (high) masses than Ref. [81], while the long-lived A search provides access to much larger regions of [m(A ), ε 2 ] parameter space.
No evidence for a signal is found in either search, and 90% CL exclusion regions are set on the γ-A kinetic-mixing strength. The prompt-like A search is performed from near the dimuon threshold up to 70 GeV, and produces the most stringent constraints on dark photons with 214 < m(A ) 740 MeV and 10.6 < m(A ) 30 GeV. The long-lived A search is restricted to the mass range 214 < m(A ) < 350 MeV, where the data sample potentially has sensitivity, and places world-leading constraints on low-mass dark photons with lifetimes O(1) ps. These results demonstrate the unique sensitivity of the LHCb experiment to dark photons, even using a data sample collected with a hardware-trigger stage that is highly inefficient for low-mass A → µ + µ − decays. The removal of this hardware-trigger stage in Run 3, along with the planned increase in luminosity, should greatly increase the potential yield of A → µ + µ − decays in the low-mass region compared to the 2016-2018 data sample, and therefore, greatly increase the dark-photon discovery potential of the LHCb experiment.

Supplemental Material for LHCb-PAPER-2019-031
Search for A → µ + µ − decays Additional details and figures for the prompt-like and long-lived searches are presented in this Supplemental Material.

Isolation Criterion
For masses above the φ(1020) meson mass, dark photons are expected to be predominantly produced in Drell-Yan processes in pp collisions at the LHC. A well-known signature of Drell-Yan production is dimuon pairs that are largely isolated, and a high-mass dark photon would inherit this property. The signal sensitivity for m(A ) > 1.1 GeV is enhanced by applying the jet-based isolation requirement used in Ref. [81]. Jet reconstruction is performed by clustering charged and neutral particle-flow candidates [94] using the anti-k T clustering algorithm [92] with R = 0.5 as implemented in FastJet [93]. Muons with p T (µ)/p T (jet) < 0.7 are rejected, where the contribution to p T (jet) from the other muon is excluded if both muons are clustered in the same jet, as this is found to provide nearly optimal sensitivity for all m(A ) > m(φ).

Prompt-Like Fits
The prompt-like fit strategy is the same as described in detail in the Supplemental Material of Ref. [81]. This strategy was first introduced in Ref. [96], where it is denoted by aic-o. The m(µ + µ − ) spectrum is scanned in steps of σ[m(µ + µ − )]/2 searching for A → µ + µ − contributions. At each mass, a binned extended maximum likelihood fit is performed, and the profile likelihood is used to determine the p-value and the confidence interval on n A ob [m(A )]. The prompt-like-search trials factor is obtained using pseudoexperiments. As in Ref. [96], each fit is performed in a ±12.5 σ[m(µ + µ − )] window around the scan-mass value using bins with widths of σ[m(µ + µ − )]/20. Near threshold, the energy released in the decay, m(µ + µ − ) 2 − 4m(µ) 2 , is used instead of the mass since it is easier to model. The confidence intervals are defined using the bounded likelihood approach, which involves taking ∆ log L relative to zero signal, rather than the best-fit value, if the best-fit signal value is negative. This approach enforces that only physical (nonnegative) upper limits are placed on n A ob [m(A )], and prevents defining exclusion regions that are much better than the experimental sensitivity in cases where a large deficit in the background yield is observed.
The signal models are determined at each m(A ) using a combination of simulated A → µ + µ − decays and the widths of the large resonance peaks that are clearly visible in the data. The background models take as input a large set of potential background components, then the data-driven model-selection process of Ref. [96] is performed, whose uncertainty is included in the profile likelihood following Ref. [97]. In this analysis, the set of possible background components includes all Legendre modes with ≤ 10 at every m(A ). Additionally, dedicated background components are included for sizable narrow SM resonance contributions. The use of 11 Legendre modes adequately describes every double-misidentified peaking background that contributes at a significant level, and therefore, these do not require dedicated background components. In mass regions where such complexity is not required, the data-driven model-selection procedure reduces the complexity which increases the sensitivity to a potential signal contribution. As in Ref. [96], all fit regions are transformed onto the interval [−1, 1], where the scan m(A ) value maps to zero. After such a transformation, the signal model is (approximately) an even function; therefore, odd Legendre modes are orthogonal to the signal component, which means that the presence of odd modes has minimal impact on the variance of n A ob [m(A )]. In the prompt-like fits, all odd Legendre modes up to ninth order are included in every background model, while only a subset of the even modes is selected for inclusion in each fit.
Regions in the mass spectrum where large known resonance contributions are observed are vetoed in the prompt-like A search. Furthermore, the regions near the η meson and the excited Υ states (beyond the Υ (3S) meson) are treated specially. For example, since it is not possible to distinguish between A → µ + µ − and η → µ + µ − contributions at m(η ), the p-values near this mass are ignored. The small excess at m(η ) is treated as signal when setting the limits on n A ob [m(A )], which is conservative in that an η → µ + µ − contribution will weaken the constraints on A → µ + µ − decays. The same strategy is used near the excited Υ masses.

Long-Lived Fits
The long-lived fit strategy is also the same as described in the Supplemental Material to Ref. [81]. The signal yields are determined from binned extended maximum likelihood fits performed on all long-lived A → µ + µ − candidates using the three-dimensional feature space of m(µ + µ − ), t, and χ 2 DF . As in the prompt-like A search, a scan is performed in discrete steps of σ[m(µ + µ − )]/2; however, in this case, discrete steps in τ (A ) are also considered. The profile likelihood is again used to obtain the p-values and the confidence intervals on n A ob [m(A ), τ (A )]. The binning scheme involves four bins in χ 2 DF : [0,2], [2,4], [4,6], and [6,8] [3,5], [5,10], and > 10 ps. The binning scheme used for m(µ + µ − ) depends on the scan m(A ) value, and is chosen such that the vast majority of the signal falls into a single bin. Signal decays mostly have small χ 2 DF values, with about 50% (80%) of A → µ + µ − decays satisfying χ 2 DF < 2 (4). Background from b-hadron decays populates the small t region and is roughly uniformly distributed in χ 2 DF , whereas background from K 0 S decays is signal-like in χ 2 DF and roughly uniformly distributed in t. Figure S3 shows the three-dimensional distribution of all long-lived A → µ + µ − candidates.
The expected contribution in each bin from photon conversions is derived from the number of candidates rejected by the conversion criterion. Two large control data samples are used to develop and validate the modeling of the b-hadron and K 0 S contributions. Both contributions are well modeled by second-order polynomials in m(µ + µ − ) 2 − 4m(µ) 2 . While no evidence for t or χ 2 DF dependence is observed for these parameters in either the b-hadron or K 0 S control sample, all parameters are allowed to vary independently in each [t, χ 2 DF ] region in the fits used in the long-lived A search. Figure S4 shows the long-lived A → µ + µ − candidates, along with the pull values obtained from fits performed to the data where no signal contributions are included. These distributions are consistent with those observed in pseudoexperiments where no signal component is generated.

Additional Figures
This section provides additional figures from the analysis, and more detailed comparisons of the new constraints presented in the Letter with previous results. IP (µ ± )] is used in the fits to increase the bin occupancies at large min[χ 2 IP (µ ± )] values. The background categories, as described in the Letter, are as follows: (hh) two prompt hadrons misidentified as muons; (hµ Q ) a misidentified prompt hadron combined with a muon produced in the decay of a heavy-flavor quark, Q, that is misidentified as prompt; and (µ Q µ Q ) two muons produced in Q-hadron decays that are both misidentified as prompt. LHCb √ s = 13 TeV prompt µ + µ − µ Q µ Q hh + hµ Q ⇒ isolation applied Figure S2: Prompt-like mass spectrum, where the categorization of the data as prompt µ + µ − , µ Q µ Q , and hh + hµ Q is determined using the min[χ 2 IP (µ ± )] fits described in the text (examples of these fits are shown in Fig. S1). The anti-k T -based isolation requirement is applied for m(A ) > 1.1 GeV.  Figure S3: Three-dimensional distribution of χ 2 DF versus t versus m(µ + µ − ), which is fit to determine the long-lived signal yields. The data are consistent with being predominantly due to b-hadron decays at small t, and due to K 0 S decays for large t and m(µ + µ − ) 280 MeV. The remaining few candidates at large t and small m(µ + µ − ) are likely photon conversions.    Figure S6: Comparison of the results presented in this Letter to existing constraints from previous experiments in the few-loop ε region (see Ref. [98] for details about previous experiments), restricted to the mass region motivated by self-interacting dark matter [4].