Construction of quasi-potentials for stochastic dynamical systems: an optimization approach

The construction of effective and informative landscapes for stochastic dynamical systems has proven a long-standing and complex problem. In many situations, the dynamics may be described by a Langevin equation while constructing a landscape comes down to obtaining the quasi-potential, a scalar function that quantifies the likelihood of reaching each point in the state-space. In this work we provide a novel method for constructing such landscapes by extending a tool from control theory: the Sum-of-Squares method for generating Lyapunov functions. Applicable to any system described by polynomials, this method provides an analytical polynomial expression for the potential landscape, in which the coefficients of the polynomial are obtained via a convex optimization problem. The resulting landscapes are based upon a decomposition of the deterministic dynamics of the original system, formed in terms of the gradient of the potential and a remaining"curl"component. By satisfying the condition that the inner product of the gradient of the potential and the remaining dynamics is everywhere negative, our derived landscapes provide both upper and lower bounds on the true quasi-potential; these bounds becoming tight if the decomposition is orthogonal. The method is demonstrated to correctly compute the quasi-potential for high-dimensional linear systems and also for a number of nonlinear examples.


I. INTRODUCTION
Multi-dimensional nonlinear dynamical systems can exhibit complex behavior including limit cycles, strange attractors and multiple fixed points.When also driven by stochastic perturbations, such systems often explore a range of conditions and may exhibit behavior such as random switching between attractors and stochastic resonance [1].The mathematics of such systems has its roots in the description of Brownian motion, which motivated the study of stochastic differential equations, generally written in the form, dx = f (x) dt + g(x) dW t . ( Here the deterministic component of the dynamics (the drift) is described by f (x), while the stochastic component (diffusion) is given by g(x) dW t , where dW t describes the increment of a Wiener process [2].
A popular description of such systems is in terms of an energy landscape, in which the dynamics are described as a ball moving in a potential basin.In some contexts, the landscape may simply give an intuitive description of the dynamics [3,4], while in others it may represent a true energy function [5].In the context of developmental biology, Waddington's epigenetic landscapes provide a popular analogy for stem cell development [6,7], even though it has proven impossible to define a true energy function for general developmental processes.
If correctly formed, landscapes may offer a quantitative analysis and interrogation tool beyond a merely phenomenological description, even in cases where the free * r.brackston13@imperial.ac.uk energy of the system may not be defined.In particular, a so-called quasi-potential landscape may be defined in terms of the Freidlin-Wentzell action functional [8].This landscape provides quantitatively a measure of the relative stability of different states and the most probable paths between them.The concept of landscapes defined by the action has received particular recent attention [9], as has the development of methods to calculate the so-called Minimum Action Path (MAP) [10][11][12].Other recent methods have sought to evaluate quasi-potential landscapes with a focus on the action functional.For linear systems an analytic expression for the quasi-potential was obtained by [13], while numerical methods applicable to nonlinear systems have been given in [14][15][16].While such methods may offer a numerical solution over a discretized space, in this work we present a method that generates an analytical solution, and furthermore is applicable to both linear and nonlinear systems.
Given the motivation behind landscape descriptions, it is unsurprising that considerable effort has been applied to find other methods for landscape evaluation based either upon experimental data or mathematical models [17,18].In biological applications, the most popular and readily applied method is that based upon the steady-state probability distribution [19].This approach is based upon the Fokker-Planck equation, which justifies computing a potential as U (x) ∝ − ln(P S (x)).In practice the steady-state distribution is either obtained by solving the Fokker-Planck equation directly [20][21][22], or found from extensive simulations of the corresponding SDE [23][24][25].While this method is easy to implement, it may be impractical for higher-dimensional systems and there is generally no guarantee that the derived landscape relates directly to the fixed points of the deterministic system.
Irrespective of how a landscape is obtained, its existence implies a decomposition of the deterministic vector field into two components, one of which is given by the gradient of the landscape while the other accounts for the remainder of the dynamics in f (x): (2) The construction of the landscape may therefore also be performed by considering the properties of this decomposition directly.As discussed by [8] and advocated by [26], one particularly interesting case is that in which the gradient and remainder are everywhere orthogonal, implying an independence between gradient-based and rotational dynamics.While this property may be satisfied in some cases, it is important to note that it remains unclear if this is always possible.Motivated by this, we develop here a method to form landscapes satisfying the slightly more general sub-orthogonal condition that ∇U (x) • f U (x) ≤ 0, of which the orthogonal decomposition is a special case.As we shall discuss below, the sub-orthogonal decomposition has the desirable property that under certain conditions it is directly related to the quasi-potential.One of the other desirable properties of the landscape, and one that is satisfied by the sub-orthogonal decomposition, is that it should be a Lyapunov function [27] for the deterministic dynamics.In this context, Lyapunov functions are often used to prove the asymptotic stability of the fixed points f (x) = 0.However, they also have the useful property that they provide the basins of attraction for the deterministic dynamics.There is therefore a strong equivalence between quasi-potential landscapes and Lyapunov functions, suggesting that preexisting methods for generating such functions may also be used to obtain the quasi-potential.This is the approach that we take here.
In this paper, we develop a method that performs a sub-orthogonal decomposition of the deterministic dynamics f (x), by utilizing a popular method for the construction of Lyapunov functions: the Sum-of-Squares (SOS) method [28].We will firstly provide some background on the properties of the sub-orthogonal decomposition in section II before providing details of this computational method in section III.We will then examine its application to a series of examples.The first of these examples will be linear systems in section IV for which reference analytical solutions exist, followed by application to nonlinear systems with various properties in section V. Conclusions will finally be given in section VI.

II. PROPERTIES OF THE SUB-ORTHOGONAL DECOMPOSITION
In this work we consider the case in which the deterministic part of (1) is decomposed as in (2), and this decomposition has the property that, When this relation becomes an equality, the decomposition can be said to be orthogonal, while otherwise we will refer to the decomposition as sub-orthogonal.As we shall now discuss, such a decomposition of the vector field has a useful relation to the quasi-potential and furthermore provides a Lyapunov function for the deterministic dynamics.
A. The quasi-potential Let us now restrict our analysis to systems in which the diffusion tensor g(x) is a uniform diagonal matrix, describing purely additive noise, Here, σ is a small constant and I n is the n × n identity matrix.Given such a system, the action associated with a path x(t) = ϕ is given by the following integral, Equation ( 5) is useful because in the limit, σ → 0, the probability of the stochastic system following a certain path is directly related to the action of that path according to, Given the properties of the action, one can define a quasi-potential Q a with respect to a fixed point a as, This quasi-potential therefore gives the action associated with the minimum action path to each point within the basin of attraction of the fixed point, and in turn may be related to the probability distribution over the state space.
Given the sub-orthogonality condition of (3), the quasi-potential may be evaluated as, In the orthogonal case, term (ii) is equal to zero, while the infimum of term (i) will approach zero for a path arbitrarily close to that satisfying the ODE ẋ = ∇U +f U [29].For the sub-orthogonal case the net contributions of both terms will be positive.The potential U therefore provides a lower bound for the quasi-potential according to, In the case of the orthogonal decomposition this inequality becomes an equality.

B. Lyapunov functions
Lyapunov functions serve an important purpose in the field of non-linear dynamical systems and feedback control, in proving asymptotic stability of a fixed point.While the proof of asymptotic stability requires specific conditions to be satisfied, here we use a slightly looser definition that allows the Lyapunov function to have wider interpretation.For a system defined by a set of ODEs ẋ = f (x) a Lyapunov function U (x) : R n → R, is one that satisfies the following constraints: These constraints impose the requirements that U is everywhere positive and that U decreases along trajectories of f (x).The key implication of these constraints is that the Lyapunov function correctly captures the basins of attraction for all the stable fixed points: a region of the state space A around a fixed point a such that if x(0) ∈ A, x(t) → a as t → ∞.Such a function therefore provides an accurate qualitative description of a landscape underlying the dynamics.Given the sub-orthogonal decomposition property of (3), we may rewrite constraint (9b) as, Sub-orthogonality of the decomposition is therefore a sufficient condition for the potential U to be a Lyapunov function for the deterministic dynamics described by f .We therefore use this condition in our optimization approach discussed below.

III. THE OPTIMIZATION METHOD
In this work we develop a method for landscape formation based on the sum-of-squares method for forming Lyapunov functions.We will therefore first give a brief overview of the standard algorithm through which SOS is used to find Lyapunov functions, before detailing the key changes required in our algorithm to achieve a suborthogonal decomposition.The algorithm itself has been implemented in the programming language Julia, built upon the JuMP package for mathematical optimization [30] and in Matlab using the SOStools package [31].

A. The standard SOS method
Finding a Lyapunov function for nonlinear dynamical systems is generally a difficult problem [27,32], and is known to be NP-hard in the case where U is a polynomial [33]: i.e. given a possible function U , even checking that it satisfies the requirements may be very computationally expensive.The basis of the SOS method is that polynomial functions formed as the sum of the squares of lower order polynomials are guaranteed to be positive-definite, and furthermore can be found via efficient optimization methods.While the requirement that U (x) is SOS is stricter than that of positive-definiteness, it makes the problem computationally tractable.
The standard use of SOS in forming a Lyapunov function involves solving the following: Here the p j are a suitable set of m monomial terms (e.g. ) and the c j are real coefficients.The parameter is a positive constant, typically of order one.The task therefore involves a pure feasibility problem, and comes down to finding any U composed of the monomials specified in (11a), subject to the inequality constraints (11b), (11c) [34].There are therefore infinitely many possible solutions, if f (x) describes a stable system.As we will discuss below, we modify each of the steps of problem (11) to produce a convex optimization problem that yields a unique result.

B. Towards orthogonality
While the SOS method is able to generate Lyapunov functions for general n-dimensional dynamical systems, such functions are not unique and will generally not satisfy the sub-orthogonality requirement of (3).As noted by several previous authors [14,26], an orthogonality requirement results in a Hamilton-Jacobi equation ∇U • f + ||∇U || 2 = 0 which is a nonlinear constraint on U (x).Such constraints cannot be directly implemented into the SOS program.In order to implement such a constraint in a linear manner, consider the matrix A Schur Complement argument [35] implies that M U 0 is positive definite for any x ∈ R n if and only if which is equivalent to ∇U •f U ≤ 0, where f U is as defined in (2).This in turn implies that ∇U • f ≤ 0, which satisfies the key requirement for a Lyapunov function and constraint (11c).With the matrix inequality constraint in place, there will still be an infinite number of possible U , most of which will not come close to orthogonality.
In order to find the unique U that minimizes |∇U • f U |, we find it necessary to make U as steep as possible, in this way maximizing the contribution of −∇U to f .This is achieved by maximizing an appropriately chosen lower bound, where each b i is a monomial in x and the i are real coefficients.The choice of these monomials is important and the method for choosing them is described below in section III C. The complete optimization procedure is therefore as follows: maximize For the case that f (x) is linear, this method can be proven to yield the correct result, as detailed in appendix A.

C. Choosing the monomial basis and lower bound
The first key choice that must be made is that of the monomial basis from which U is constructed, i.e. the p j in (15b).In the typical application of SOS to form Lyapunov functions, a candidate function is formed from a full collection of the monomials in x up to a given (even) degree d [36].For a given n-dimensional system, this results in m = n+d d monomials forming the polynomial function U [37].For example, for a 4th order polynomial and 4-dimensional state space this results in 40 monomials.Since the computational cost of the optimization increases with m, it is advantageous to reduce the basis if possible, by exploiting particular properties of the function we are trying to obtain.

A minimal basis
We can initially provide an upper bound on the degree d using the sub-orthogonality condition (3), which may be rewritten as, Since |∇U | 2 ≥ 0, the above inequality can only hold if the degree of ∇V • f is at least as big as the degree of |∇U | 2 .If d is the degree of U and e the degree of f , then d + e − 1 ≥ 2d − 2, meaning that, While ( 17) provides an upper bound on the degree of U , we may further refine the basis (15b), motivated by the properties in the pure gradient case, in which f i = −∂U/∂x i .The minimal basis for such a U is therefore provided by where the operator M extracts a vector of monomials from a polynomial.
As an example, consider the vector field defined by, Here the highest order in f is three so (17) implies that we will likely require a d = 4 order polynomial.Given that the system has n = 3 dimensions, the standard method leads to 35 monomials.By contrast, a minimal basis would only include five terms, [x 4 1 , x 4 2 , x 4 3 , x 2 1 , 1].

The lower bound
Given a monomial basis for U , it is next necessary to choose the monomials b i in the lower bound (14).This choice is best illustrated via the following simple system described by, Given that ( 20) is one-dimensional and therefore may be written as a pure gradient system, the minimal basis argument of (18) correctly ascertains that U (x) will be composed only of monomials [x 4 , x 2 , x, 1].A sensible choice for the lower bound will be b(x) = x q , where q is an even integer.The inequality constraint (15c) therefore becomes: where the choice of q is of critical importance.In particular it must satisfy two particular properties: a. Sufficient pressure Suppose that we take q = 2, now for any c 3 , c 0 > 0 the constraint may be satisfied.Yet c 3 may remain arbitrarily small and the problem will not have converged.The lower bound is therefore unable to exert sufficient "upward pressure" on U .
b. Feasibility Alternatively consider q = 6.In this case it is impossible to find coefficients such that U > x 6 , since as x → ∞, c 3 x 4 < x 6 for any c 3 , > 0. The program will therefore be infeasible.
The correct choice for q in this case is q = 4, i.e. the highest degree in the basis for U .An example of this lower bound in practice is displayed in figure 1.The same principles of sufficient upward pressure and feasibility also apply to higher dimensional systems, such that the lower bound is chosen to consist of the highest degree single and mixed monomials in the basis for U .

Extending the basis
A correctly chosen "minimal" basis will naturally work for the case that f (x) is a pure gradient system, and may sometimes work in other cases [38].However more generally, our constraints may require that ∇U contains monomials beyond (18), as will be illustrated in later examples.Nonetheless, in addition to considerations of computational cost, the sensitivity of the method to the choice of lower bound means that we cannot simply choose all monomials up to degree d, as is done in the standard SOS method.
Consider again the example given in (20).Based on the above discussion it is clear that we could not have included an additional higher-order monomial in the basis for U .For example had we included a x 6 term, we would logically have also chosen q = 6 for the lower bound.Since the actual potential need not include a x 6 term, this would make the program infeasible.The principles of sufficient pressure and feasibility in the choice of the lower bound, therefore guide which monomial terms are permitted in the basis for U .The procedure for determining the basis is therefore as follows: 1.The highest order monomial in any individual x i should be that obtained from the minimal basis.
2. Other mixed monomials are required, provided that: (a) the total order is not more than the highest total order in the minimal basis, (b) the individual order in each x i is not more than the highest individual order monomial in the minimal basis.

D. Iterative improvement
The process outlined above performs well for many systems, but may require further improvement to push the obtained landscape closer to orthogonality.Given an initial Lyapunov function U 1 such as that found by implementing (15), one can attempt to iteratively improve the solution, generating at each step a second Lyapunov function U 2 composed of the same monomial basis.Each iteration involves solving the following optimization problem: The key to this optimization is that by using the initial guess, U 1 , the optimization is constructed to be linear in the new improved Lyapunov function, U 2 .Constraints (22c) and (22d), simply mirror those in optimization (15), namely positive definiteness of the Lyapunov function and the sub-orthogonality condition.Constraint (22e) is chosen such that for α = 1 equality is guaranteed if This result is proven in Appendix B, however the key point is that the optimization has a guaranteed feasible solution with α = 1 and U 2 = U 1 , while if a smaller α is obtained then the new Lyapunov function is closer to orthogonality.The complete method for obtaining the potential landscape therefore consists of first applying procedure ( 15), followed by a number of iterations of procedure (22).We generally find that only one or two iterations of this method are required for satisfactory convergence.The number of iterations may therefore either be pre-specified or determined based upon a convergence criterion.

IV. THE LINEAR CASE
A special case of the orthogonal decomposition may be considered when the deterministic system is linear [14].While this may seem a very limiting test case, the linear situation can exemplify some of the key scenarios and limitations that also apply in the nonlinear case, while remaining tractable to analysis.

A. Properties of linear systems
Linear dynamics may be written in terms of a matrix multiplication as, where the matrix A ∈ R n×n .The construction of the quasi-potential then comes down to decomposing the matrix A into two parts as where the gradient matrix A g is a symmetric matrix that gives the potential according to Because A g is symmetric, it has purely real eigenvalues, while if the decomposition is orthogonal, the remainder matrix A c has purely imaginary eigenvalues and is such that the product A g A c is an antisymmetric matrix.
In such linear cases it has been demonstrated that an orthogonal decomposition does always exist [39], and furthermore may be provided by an analytical expression [13] as, If the matrix A is normal (AA * = A * A), then this expression simplifies to, Since these expressions provide solutions against which our method can be tested, linear systems give a perfect test case for our optimization-based approach which does not rely on any prior knowledge of the properties of the system.

B. A three-dimensional example
We consider now the test-case of a three-dimensional linear system defined by, Implementation of methods ( 15) and ( 22 giving the quasi-potential as, As can readily be observed, the matrix A g is symmetric and can easily be verified to have purely real (negative) eigenvalues.The matrix A c is not itself either symmetric or antisymmetric but can be verified to have purely imaginary eigenvalues.The dynamics associated with A c are therefore those of pure oscillation, and evolve on level sets of the potential U .
It is noteworthy that for the system defined by A, there is no direct connection between x 1 and x 2 .One may therefore naively expect such a direct connection to also be absent in the potential.Nevertheless, in order to provide an orthogonal decomposition, the potential matrix A g does include such terms that are then negated in A c .This confirms that a truly minimal basis is indeed insufficient for forming the quasi-potential, as discussed in section III C 3.
FIG. 2. Scaling of the computational cost with system size.Tests were performed on a basic desktop PC with 16 Gb ram and an Intel Core i3 processor.

C. Scaling with system size
While the method may be verified to work for this three-dimensional example, it is also useful to assess the accuracy and efficiency of the algorithm for higher dimensional systems.We do this by randomly generating real, negative-definite matrices of varying size n = 2 : 10.In all such cases the method is able to perform a decomposition that satisfies the orthogonality constraint.Results for the computational cost are shown in figure 2, displayed on a logarithmic scale.It is evident that the cost increases considerably with system size, most likely in a factorial fashion.For small systems the algorithm converges in less than a second, while for larger systems it may take up to an hour.
It is worth noting here that some of the additional computational cost associated with larger systems arises from a greater number of times that the iteration procedure ( 22) must be applied, given a desired level of orthogonality.For low-dimensional problems we find that sometimes no iterations are required, while for systems of dimension n = 10, up to five iterations may be necessary to achieve the same quality of result.

V. THE NONLINEAR CASE
Having studied the ability of our method to obtain the decomposition in linear cases, we now move on to fully non-linear examples for which analytical solutions do not always exist.

A. A nonlinear multistable system
Our first example is the quartic system from [26] with four attractors and a known orthogonal decomposition.The dynamics are defined by, for which the true potential is given by, For such a problem our method finds the potential to within 5 significant figures within a few seconds.Because the output of the algorithm is a symbolic expression for the potential, the actual value of U may then be evaluated at any required points in the space with minimal further effort.This is in contrast of course, to methods that solve the associated PDE over a discretized grid, since evaluation outside of the solved area requires significant further computation.

B. The Maier-Stein model
We next apply the method to the widely studied Maier-Stein model of [40], defined by, This model has the property that when γ = µ, the dynamics can be expressed in terms of a pure potential [41] f (x) = −∇U , where, For cases in which γ = µ, the system also has a nongradient component.We apply our method for three different parameter combinations, obtaining in each case a polynomial expression for the potential with the same non-zero terms.The coefficients for these terms are displayed in table I.For the non-gradient cases, the decompositions are, as desired, sub-orthogonal.

C. Bounds on the quasi-potential
A useful property of the (sub-)orthogonal decomposition is that it may be used to obtain predicted MAPs.As discussed in section II, MAPs are those paths that minimize the action functional and provide a definition of the true quasi-potential.In general these paths must be found via a costly optimization that must be performed for every point in the state-space.For a system that satisfies the orthogonal decomposition however, the MAP from a fixed point x o to another point x e within the basin of attraction, is one that follows, Such a path may therefore readily be obtained by a simulation starting at x e and following ẋ = −∇U − f U .While we expect paths described by (35) to match exactly in the orthogonal case, for sub-orthogonal cases we hope that there may still be close agreement, depending on the degree of orthogonality.We therefore compare the predicted and exact paths, evaluating the true MAPs via our own implementation of the optimization approach detailed in [12].A comparison of paths from the fixed point x o = (−1.0,0.0) is given in figure 3.In the gradient case, the paths match exactly, as expected.For the first non-gradient cases, there is clearly some disagreement, although the sense of curvature of the paths is generally the same for the predicted and true MAPs.
While the agreement of the paths may seem quantitatively poor, these paths may still be used to provide an estimate of the true quasi-potential.Given that the quasi-potential is defined in terms of the infimum of the action over all possible paths, the action of any path that we choose can provide an upper bound to the true quasipotential, evaluated via the geometric minimum actio method [10].While most choices of possible path will give an action much greater than that of the MAP, the SOS predicted path may be expected to give a tighter bound, owing to the qualitative similarity with the true MAP.In addition to this upper bound on the true quasi-potential, the potential from the decomposition itself provides a lower bound, as discussed in section II.
A quantitative comparison between the true quasipotential Q, the potential from the decomposition U and that from the SOS predicted path S is given in figure 4. In the gradient case there is almost exact agreement between all three methods, demonstrating that the obtained decomposition provides both the true quasipotential and predicts the correct paths.In the first non-gradient case, the upper bound remains remarkably tight, despite the large differences between the true and predicted paths.For the second non-gradient case the bounds are even tighter, with an exact match along the x-axis.In all cases, each of U and S can be seen to indeed be lower and upper bounds respectively.

VI. CONCLUSIONS
Evaluating the quasi-potential for linear and nonlinear SDEs is a challenging problem, yet one with significant interest and motivation.In this work we have provided a novel method for the calculation of the quasi-potential based upon the Sum of Squares method for constructing Lyapunov functions.Our method is applicable to systems for which the governing equations are polynomial and involves solving an optimization over the coefficients of a polynomial potential function.
The construction of an informative landscape is motivated by three key requirements.Firstly, we would like the potential to correctly capture the basins of attraction for the deterministic system.Such a requirement is equivalent to the potential being a Lyapunov function, and is therefore naturally achieved by our method.Secondly, it is desirable to have an estimate of the most probable transition trajectories between basins of attraction; the so-called minimum action paths.For cases permitting an orthogonal decomposition of the dynamics, the paths may readily be obtained from the two vector-field components.For cases in which the decomposition is sub-orthogonal, the obtained paths may provide a moreor-less accurate approximation, suitable for use as an initial guess in an optimization routine.Finally, we may wish for the potential function to be a quasi-potential for the system, accurately describing the transition probabilities for situations of vanishingly small noise.Again, for systems permitting an orthogonal decomposition of the dynamics our method calculates exactly such a quasipotential, while for sub-orthogonal cases the potential may be used to provide both a lower and upper bound.
The first key limitation of the method is in its applicability to only polynomial systems.However, such systems are commonplace in e.g.mass action models of chemical kinetics and linear models for dynamical systems.A closely related limitation is in the ability of the method to express the potential itself in terms of a polynomial.It is plausible that in some cases even if the governing equations are polynomial, a potential satisfying the normal decomposition must be expressed in some expanded basis beyond monomial terms.Regardless, our polynomial sub-orthogonal potential still provides useful bounds.
A second key limitation is that the obtained quasipotential is only valid for systems with additive noise, in which the noise tensor is equal to the identity matrix.Yet this is a common approximation, especially when the magnitude of the noise tends to zero, and furthermore can always be achieved for linear systems via a coordinate transform.If multiplicative noise is present but may expressed as a polynomial, it is possible that in some cases this could be incorporated into the algorithm, however this is beyond the scope of this study.
While we have provided a method that generates an analytical expression for the quasi-potential for a particular subset of SDE systems, it remains unclear if such a feat can be achieved in more general cases.Ultimately, we hope that the method provides another useful tool for the interpretation and analysis of stochastic dynamical systems, and may pave the way for more general methods to obtain the quasi-potential in a symbolic manner.

FIG. 1 .
FIG. 1.A schematic of the lower bounding method.(a) The potential function in the region of the fixed points and (b) a comparison between the potential and lower bound.

TABLE I .
Coefficients obtained for the Maier-Stein model for three different parameter combinations.