Deep Learning and AdS/CFT

We present a deep neural network representation of the AdS/CFT correspondence, and demonstrate the emergence of the bulk metric function via the learning process for given data sets of response in boundary quantum field theories. The emergent radial direction of the bulk is identified with the depth of the layers, and the network itself is interpreted as a bulk geometry. Our network provides a data-driven holographic modeling of strongly coupled systems. With a scalar $\phi^4$ theory with unknown mass and coupling, in unknown curved spacetime with a black hole horizon, we demonstrate our deep learning (DL) framework can determine them which fit given response data. First, we show that, from boundary data generated by the AdS Schwarzschild spacetime, our network can reproduce the metric. Second, we demonstrate that our network with experimental data as an input can determine the bulk metric, the mass and the quadratic coupling of the holographic model. As an example we use the experimental data of magnetic response of a strongly correlated material Sm$_{0.6}$Sr$_{0.4}$MnO$_3$. This AdS/DL correspondence not only enables gravity modeling of strongly correlated systems, but also sheds light on a hidden mechanism of the emerging space in both AdS and DL.

Introduction.-The AdS/CFT correspondence [1][2][3], a renowned holographic relation between d-dimensional quantum field theories (QFTs) and (d + 1)-dimensional gravity, has been vastly applied to strongly coupled QFTs including QCD and condensed matter systems. For phenomenology, the holographic modelings were successful only for restricted class of systems in which symmetries are manifest, mainly because the mechanism of how the holography works is still unknown. For a quantum system given, we do not know whether its gravity dual exists and how we can construct a holographic model. Suppose one is given experimental data of linear/nonlinear response under some external field, can one model it holographically? In this letter we employ deep learning (DL) [4][5][6], an active subject of computational science, to provide a data-driven holographic gravity modeling of strongly coupled quantum systems. While conventional holographic modeling starts with a given bulk gravity metric, our novel DL method solves the inverse problem: given data of a boundary QFT calculates a suitable bulk metric function, assuming the existence of a black hole horizon.
Our strategy is simple: we provide a deep neural network representation of a scalar field equation in (d + 1)dimensional curved spacetime. The discretized holographic ("AdS radial") direction is the deep layers, see Fig. 1. The weights of the neural network to be trained are identified with a metric component of the curved spacetime. The input response data is at the boundary of AdS, and the output binomial data is the black hole works with the popular AdS Schwarzschild metric, by showing that the metric is successfully learned and reproduced by the DL. Then we proceed to use an experimental data of a magnetic response of Sm 0.6 Sr 0.4 MnO 3 known to have strong quantum fluctuations, and demonstrate the emergence of a bulk metric via the AdS/DL correspondence.
Our study gives a first concrete implementation of the AdS/CFT into deep neural networks. We show the emergence of a smooth geometry from given experimental data, which opens a possibility of revealing the mystery of the emergent geometry in the AdS/CFT with the help of the active researches of DL. A similarity between the AdS/CFT and the DL was discussed recently [7] [68], and it can be discussed through tensor networks, the AdS/MERA correspondence [11] [69].
Let us briefly review a standard deep neural network. It consists of layers (see Fig. 1), and between the adjacent layers, a linear transformation x i → W ij x j and a nonlinear transformation known as an activation function, x i → ϕ(x i ), are succeedingly act. The final layer is for summarizing all the component of the vector. So the output of the neural network is In the learning process, the variables of the network ij ) for n = 1, 2, · · · , N − 1 are updated by a gradient descent method with a given loss function of the L 1 -norm error, Here the sum is over the whole set of pairs {(x (1) ,ȳ)} of the input datax (1) and the output dataȳ. The regularization E reg is introduced to require expected properties for the weights [70].
Neural network of scalar field in AdS.-Let us embed the scalar field theory into a deep neural network. A scalar field theory in a (d + 1)-dimensional curved spacetime is written as For simplicity we consider the field configuration to depend only on η (the holographic direction). Here the generic metric is given by with the asymptotic AdS boundary condition f ≈ g ≈ exp[2η/L] (η ≈ ∞) with the AdS radius L, and another boundary condition at the black hole horizon, f ≈ η 2 , g ≈

FIG. 2:
The simplest deep neural network reproducing the homogeneous scalar field equation in a curved spacetime. Weights W are shown by solid lines explicitly, while the activation is not.
const. (η ≈ 0). The classical equation of motion for φ(η) is where we have defined π so that the equations become a first order in derivatives. The metric dependence is sum- Discretizing the radial η direction, the equations are rewritten as We regard these equations as a propagation equation on a neural network, from the boundary η = ∞ where the input data (φ(∞), π(∞)) is given, to the black hole horizon η = 0 for the output data, see Fig. 2. The N layers of the deep neural network are a discretized radial direction η which is the emergent space in AdS, η (n) ≡ (N − n + 1)∆η. The input data x (1) i of the neural network is a two-dimensional real vector (φ(∞), π(∞)) T . So the linear algebra part of the neural network (the solid lines in Fig. 1) is automatically provided by The activation function at each layer reproducing (6) is The definitions (7) and (8) bring the scalar field system in curved geometry (3) into the form of the neural network (1) [71].
Response and input/output data.-In the AdS/CFT, asymptotically AdS spacetime provides a boundary condition of the scalar field corresponding to the response data of the quantum field theory (QFT). With the AdS radius L, asymptotically h(η) ≈ d/L. The external field value J (the coefficient of a non-normalizable mode of φ) and its response O (that of a normalizable mode) in The data generated by the discretized AdS Schwarzschild metric (11). Blue points are the positive data (y = 0) and the green points are the negative data (y = 1). the QFT are [55], in the unit of L = 1, a linear map . The value η = η ini ≈ ∞ is the regularized cutoff of the asymptotic AdS spacetime. We use (9) for converting the response data of QFT to the input data of the neural network.
The input data at η = η ini propagates in the neural network toward η = 0, the horizon. If the input data is positive, the output at the final layer should satisfy the boundary condition of the black hole horizon (see for example [56]), Here η = η fin ≈ 0 is the horizon cutoff. Our final layer is defined by the map F , and the output data is y = 0 for a positive answer response data (J, O ). In the limit η fin → 0, the condition (10) is equivalent to π(η = 0) = 0. With this definition of the network and the training data, we can make the deep neural network to learn the metric component function h(η), the parameter m and the interaction V [φ]. The training is with a loss function E given by (2)[72]. Experiments provide only positive answer data {(J, O ), y = 0}, while for the training we need also negative answer data : {(J, O ), y = 1}. It is easy to generate false response data (J, O ), and we assign output y = 1 for them. To make the final output of the neural network to be binary, we use a function tanh |F | (or its variant) for the final layer rather than just F , because tanh |F | provides ≈ 1 for any negative input.
Learning test: AdS Schwarzschild black hole.-To check whether this neural network can learn the bulk metric, we first demonstrate a learning test. We will see that with data generated by a known AdS Schwarzschild metric, our neural network can learn and reproduce the metric [73]. We work here with d = 3 in the unit L = 1. The metric is h(η) = 3 coth(3η) (11) and we discretize the η direction by N = 10 layers with η ini = 1 and η fin = 0.1. We fix for simplicity m 2 = −1 and V [φ] = λ 4 φ 4 with λ = 1. Then we generate positive answer data with the neural network with the discretized (11), by collecting randomly generated (φ(η ini , π(η ini )) giving |F | < where = 0.1 is a cut-off. The negative answer data are similarly generated under the criterion |F | > . We collect 1000 positive and 1000 negative data, see Fig. 3. Since we are interested in a smooth continuum limit of h(η), and the horizon boundary con- We use PyTorch for a Python deep learning library to implement our network [74]. The initial metric is randomly chosen. Choosing the batch size equal to 10, we find that after 100 epochs of the training our deep neural network successfully learned h(η) and it coincides with (11), see Fig. 4 . The statistical analysis with 50 learned metric, Fig. 4 (c), shows that the asymptotic AdS region is almost perfectly learned. The near horizon region has ≈ 30% systematic error, and it is expected also for the following analysis with experimental data.
Emergent metric from experiments.-Since we have checked that the AdS Schwarzschild metric is successfully reproduced, we shall apply the deep neural network to learn a bulk geometry for a given experimental data. We use experimental data of the magnetization curve (the magnetization M [µ B /M n ] vs the external magnetic field H [Tesla]) for the 3-dimensional material Sm 0.6 Sr 0.4 MnO 3 which is known to have a strong quantum fluctuation [57], see Fig. 5. We employ a set of data at temperature 155 K which is slightly above the critical temperature, since it exhibits the deviation from a linear M -H curve suggesting a strong correlation. To form a positive data we add a random noise around the experimental data, and also generated negative data positioned away from the positive data. [76] The same neural network is used, except that we add a new zero-th layer to relate the experimental data with (φ, π), motivated by (9) : We introduce the normalization parameters α and β to relate (H, M ) to the bulk φ, and the asymp- In our numerical code we introduce a dimensionful parameter L unit with which all the parameters are measured in the unit L unit = 1. We add another regularization term E reg = E , the metric value near the horizon, to match the standard horizon behavior 1/η, see the supplemental material for the details. We chose N = 10 and c machine learning, m and λ, α and β are trained, as well as the metric function h(η).
We stopped the training when the loss becomes smaller than 0.02, and collected 13 successful cases. The emergent metric function h(η) obtained by the machine learning is shown in Fig. 6. It approaches a constant at the boundary, meaning that it is properly an asymptotically AdS spacetime. The obtained (dimensionless) parameters for the scalar field are m 2 L 2 = 5.6 ± 2.5, λ/L = 0.61 ± 0.22 [77]. In this manner, a holographic model is determined numerically from the experimental data, by the DL.
Summary and outlook.-We put a bridge between two major subjects about hidden dimensions: the AdS/CFT and the DL. We initiate a data-driven holographic modeling of quantum systems by formulating the gravity dual on a deep neural network. We show that with an appropriate choice of the sparse network and the input/output data the AdS/DL correspondence is properly formulated, and the standard machine learning works nicely for the automatic emergence of the bulk gravity for given response data of the boundary quantum systems.
Our method can be applied to any holographic models. With vector fields in the bulk, not only h(η) but other metric components can be determined by the DL. To explore the significance of the neural network repre- sentation of black hole horizons, the systematic error near the horizon would need to be reduced. Comparison with confining gauge theories giving a Dirichlet condition as the output could be helpful.
How can our study shed light on the mystery of the emergent spacetime in AdS/CFT correspondence? A continuum limit of deep neural networks can accommodate arbitrarily nonlocal systems as the network basically includes all-to-all inter-layer connections. So, the emergence of the new spatial dimension would need a reduction of the full DL parameter space. A criterion to find a properly sparse neural network which can accommodate local bulk theories is missing, and the question is similar to the AdS/CFT where criteria for QFT to have a gravity dual is still missing. At the same time, our work suggests that the bulk emergence could be a more generic phenomenon. For further exploration of the AdS/DL correspondence, we plan to formulate a "holographic autoencoder", motivated by a similarity between DL autoencoders and the cMERA at finite temperature [63,64], and also the thermofield formulation of the AdS/CFT [65,66]. Characterization of black hole horizons in DL may be a key to understand the bulk emergence.
We would like to thank H. Sakai [8,9] for related essays. A continuum limit of the deep layers was studied in a different context [10].
[69] An application of DL or machine learning to quantum many-body problems is a rapidly developing subject. See [12] for one of the initial papers, together with recent papers . For machine learning applied to string landscape, see [48][49][50][51][52][53][54].
[70] In Bayesian neural networks, regularizations are introduced as a prior.

HAMILTONIAN SYSTEMS REALIZED BY DEEP NEURAL NETWORK
Here we show that a restricted class of Hamiltonian systems can be realized by a deep neural network with a local activation function. [10] We consider a generic Hamiltonian H(p, q) and its Hamilton equation, and seek for a deep neural network representation (1) representing the time evolution by H(p, q). The time direction is discretized to form the layers. (For our AdS/CFT examples, the radial evolution corresponds to the time direction of the Hamiltonian which we consider here.) Let us try first the following generic neural network and identify the time translation t → t + ∆t with the inter-layer propagation, q(t + ∆t) = ϕ 1 (W 11 q(t) + W 12 p(t)), p(t + ∆t) = ϕ 2 (W 12 q(t) + W 22 p(t)). are directly identified with the canonical variables q(t) and p(t), and t = n∆t. We want to represent Hamilton equations to be of the form (S.13). It turns out that it is impossible except for free Hamiltonians.
In order for (S.13) to be consistent at ∆t = 0, we need to require (S.14) So we put an ansatz where w ij (i, j = 1, 2) are constant parameters and g i (x) (i = 1, 2) are nonlinear functions. Substituting these into the original (S.13) and taking the limit ∆t → 0, we obtaiṅ q = w 11 q + w 12 p + g 1 (q),ṗ = w 21 q + w 22 p + g 2 (p) .

(S.16)
For these equations to be Hamiltonian equations, we need to require a symplectic structure ∂ ∂q (w 11 q + w 12 p + g 1 (q)) + ∂ ∂p (w 21 q + w 22 p + g 2 (p)) = 0. (S.17) However, this equation does not allow any nonlinear activation function g i (x). So, we conclude that a simple identification of the units of the neural network with the canonical variables allow only linear Hamilton equations, thus free Hamiltonians.
In order for a deep neural network representation to allow generic nonlinear Hamilton equations, we need to improve our identification of the units with the canonical variables, and also of the layer propagation with the time translation. Let us instead try The difference from (S.13) is two folds: First, we define i, j, k = 0, 1, 2, 3 with x 1 = q and x 2 = p, meaning that we have additional units x 0 and x 3 . Second, we consider a multiplication by a linear W . So, in total, this is successive actions of a linear W , a nonlinear local ϕ and a linear W , and we interpret this set as a time translation ∆t. Since we pile up these sets as many layers, the last W at t and the next W at t + ∆t are combined into a single linear transformation W t+∆t W t , so the standard form (1) of the deep neural network is kept. We arrange the following sparse weights and local activation functions where u, v, w ij (i, j = 1, 2) are constant weights, and ϕ i (x i ) are local activation functions. The network is shown in the right panel of Fig. 7. Using this definition of the time translation, we arrive aṫ q = w 11 q + w 12 p + λ 1 f (vp),ṗ = w 11 q + w 12 p + λ 2 g(uq).
(S.20) Then the symplectic constraint means w 11 + w 22 = 0, and the Hamiltonian is given by ). This is the generic form of the nonlinear Hamiltonians which admit a deep neural network representation. Our scalar field equation in the curved geometry (5) is within this category. For example, choosing means a popular Hamiltonian for a non-relativistic particle moving in a potential, A more involved identification of the time translation and the layer propagation may be able to accommodate Hamiltonians which are not of the form (S.21). We leave generic argument for the future investigation.

ERROR FUNCTION OF THE ADS SCALAR SYSTEM
For λ = 0, we can obtain an explicit expression for the error function (loss function) for the machine learning in our AdS scalar field system. The scalar field equation (5) can be formally solved as a path-ordered form So, in the continuum limit of the discretized neural network, the output is provided as Then the error function (2) is provided as The learning process is equivalent to the following gradient flow equation with a fictitious time variable τ , For the training of our numerical experiment using the experimental data, we have chosen the initial configuration of h(η) as a constant (which corresponds to a pure AdS metric). For a constant h(η) = h, the error function can be explicitly evaluated with is the eigenvalue of the matrix which is path-ordered. Using this expression, we find that at the initial epoch of the training the function h(η) is updated by an addition of a function of the form exp[(λ + − λ − )η] and of the form exp[−(λ + − λ − )η]. This means that the update is effective in two regions: near the black hole horizon η ≈ 0 and near the AdS boundary η ≈ ∞.
Normally in deep learning the update is effective near the output layer because any back propagation could be suppressed by the factor of the activation function. However our example above shows that the update near the input layer is also updated. The reason for this difference is that in the example above we assumed λ = 0 to solve the error function explicitly, and it means that the activation function is trivial. In our numerical simulations where λ = 0, the back propagation is expected to be suppressed near the input layer.

BLACK HOLE METRIC AND COORDINATE SYSTEMS
Here we summarize the properties of the bulk metric and the coordinate frame which we prefer to use in the main text.
The 4-dimensional AdS Schwarzschild black hole metric is given by where L is the AdS radius, and r = r 0 is the location of the black hole horizon. r = ∞ corresponds to the AdS boundary. To bring it to the form (4), we make a coordinate transformation With this coordinate η, the metric is given by The AdS boundary is located at η = ∞ while the black hole horizon resides at η = 0. The function h(η) appearing in the scalar field equation (5) is The r 0 dependence, and hence the temperature dependence, disappears because our scalar field equation (5) assumes time independence and x i -independence. This h(η) is basically the invariant volume of the spacetime, and is important in the sense that a certain tensor component of the vacuum Einstein equation coming from results in a closed form It can be shown that the ansatz (S.29) leads to a unique metric solution for the vacuum Einstein equations, and the solution is given by (S.32) up to a constant shift of η. Generically, whatever the temperature is, and whatever the matter energy momentum tensor is, the metric function h(η) behaves as h(η) ≈ 1/η near the horizon η ≈ 0, and goes to a constant (proportional to the AdS radius L) at the AdS boundary η ≈ ∞.
One may try to impose some physical condition on h(η). In fact, the right hand side of (S.34) is a linear combination of the energy momentum tensor, and generally we expect that the energy momentum tensor is subject to various energy conditions, which may constrain the η-evolution of h(η). Unfortunately it turns out that a suitable energy condition for constraining h(η) is not available, within our search. So, non-monotonic functions in η are allowed as a learned metric.

Comments on the regularization
Before getting into the detailed presentation of the coding, let us make some comments on the effect of the regularization E reg and the statistical analysis of the learning trials.
First, we discuss the meaning of E reg in (2). In the first numerical experiment for the reproduction of the AdS Schwarzschild black hole metric we took This regularization term works as a selection of the metrics which are smooth. We are interested in the metric with which we can take a continuum limit, so a smooth h(η) is better for our physical interpretation. Without E reg , the learned metrics are far from the AdS Schwarzschild metric: see Fig.8 for an example of the learned metric without E reg . Note that the example in Fig. 8 achieves the accuracy which is the same order as that of the learned metric with E reg . So, in effect, this regularization term does not spoil the learning process, but actually picks up the metrics which are smooth, among the learned metrics achieving the same accuracy. Second, we discuss how the learned metric shown in Fig. 4 is generic, for the case of the first numerical experiment. We have collected results of 50 trials of the machine learning, and the statistical analysis is presented in Fig. 4 (c). It is shown that the metric in the asymptotic region is quite nicely learned, and we can conclude that the asymptotic AdS spacetime has been learned properly. On the other hand, for the result in the region near the black hole horizon, the learned metric reproduces qualitatively the behavior around the horizon, but quantitatively it deviates from the true metric. This could be due to the discretization of the spacetime.
Third, let us discuss the regularization for the second numerical experiment for the emergence of the metric for the condensed mater material data. The regularization used is with c (2) reg = 10 −4 . The second term is to fit the metric h(η) near the horizon to the value 1/η, because 1/η behavior is expected for any regular horizons. In Fig. 9, we present our statistical analyses of the obtained metrics for two other distinct choices of the regularization parameter: c reg = 0, there is no regularization E reg , so the metric goes down to a negative number at the horizon. For c (2) reg = 0, which is a strong regularization, the metric is almost completely fixed to a value 1/η with η = η (N ) . For all cases, the learned metrics achieve a loss ≈ 0.02, so the system is successfully learned. The only difference is how we pick up "physically sensible" metrics among many learned metrics. In Fig. 6, we chose c (2) reg = 10 −4 which is in between the values used in Fig. 9, because the deviation of the metric near the horizon is of the same order as that near the asymptotic region.

Numerical experiment 1: Reconstructing AdS Schwarzschild black hole
We have performed two independent numerical experiments: The first one is about the reconstruction of the AdS Schwarzschild black hole metric, and the second one is about the emergence of a metric from the experimental data of and prepare data {(x (1) ,ȳ)} to train the neural network. The training data is just a list of initial pairs ofx (1) = (φ, π) and corresponding answer signalȳ. We regardx (1) = (φ, π) as field values at the AdS boundary, and define the answer signal so that it represents whether they are permissible or not when they propagate toward the black hole horizon.
To train the network appropriately, it is better to prepare a data containing roughly equal number ofȳ = 0 samples andȳ = 1 samples. We take a naive strategy here: If the result of step 3 becomesȳ = 0, we add the sample (x (1) ,ȳ) to the positive data category, if not, we add the sample to the negative data category. Once the number of samples of one category saturates to 10 3 , we focus on collecting samples in another category. After collecting both data, we concatenate positive data and negative data and regard it as the total data for the training: Training data D = 10 3 positive data ⊕ 10 3 negative data , where positive data = {(x (1) ,ȳ = 0)} negatve data = {(x (1) ,ȳ = 1)} .
Besides it, we prepare the neural network (1) with the restricted weight (7). The only trainable parameters are h(η (n) ), and the purpose of this experiment is to see whether trained h(η (n) ) are in agreement with AdS Schwarzschild metric (11) encoded in the training data implicitly. To compareȳ and neural net output y, we make following final layer. First, we calculate F ≡ π(η fin ) (which is the r.h.s. of (10) in the limit η fin → 0), and second, we define y ≡ t(F ) where t(F ) = tanh 100(F − 0.1) − tanh 100(F + 0.1) + 2 /2. (S.38) We plot the shape of t(F ) in Figure 10. Before running the training iteration, we should take certain initial values for h(η (n) ). We use the initial h(η (n) ) ∼ N (1/η (n) , 1) (which is a gaussian distribution), because any black hole horizon is characterized by the 1/η (n) behavior at η (n) ≈ 0. [11] After setting the initial values for the trained parameters, we repeat the training iteration: