Correlating thermal machines and the second law at the nanoscale

Thermodynamics at the nanoscale is known to differ significantly from its familiar macroscopic counterpart: the possibility of state transitions is not determined by free energy alone, but by an infinite family of free-energy-like quantities; strong fluctuations (possibly of quantum origin) allow to extract less work reliably than what is expected from computing the free energy difference. However, these known results rely crucially on the assumption that the thermal machine is not only exactly preserved in every cycle, but also kept uncorrelated from the quantum systems on which it acts. Here we lift this restriction: we allow the machine to become correlated with the microscopic systems on which it acts, while still exactly preserving its own state. Surprisingly, we show that this restores the second law in its original form: free energy alone determines the possible state transitions, and the corresponding amount of work can be invested or extracted from single systems exactly and without any fluctuations. At the same time, the work reservoir remains uncorrelated from all other systems and parts of the machine. Thus, microscopic machines can increase their efficiency via clever"correlation engineering"in a perfectly cyclic manner, which is achieved by a catalytic system that can sometimes be as small as a single qubit (though some setups require very large catalysts). Our results also solve some open mathematical problems on majorization which may lead to further applications in entanglement theory.


I. INTRODUCTION
Thermodynamics, as it is presented in the textbooks, is usually concerned with macroscopic physical systems, like large ensembles of weakly interacting gas molecules.In this regime, the law of large numbers renders fluctuations mostly irrelevant, and one obtains very precise statistical predictions simply by computing averages.One of the most important quantities in this regime is the Helmholtz free energy, where E ρ is the average energy of the system in state ρ, and S is its entropy.At constant ambient temperature T and constant volume, transitions between two states are possible if and only if the difference between the free energies of the initial and the final state is negative.The free energy difference also tells us how much work we can extract, or need to invest, during a thermodynamic state transition.
However, this formulation of the second law applies only in the thermodynamic limit of large numbers of identically distributed or weakly interacting particles.In contrast, modern technology allows us to probe and manipulate physical systems at much smaller scales [1][2][3][4], where quantum fluctuations and strong correlations may dominate.Understanding the subtleties of thermodynamics in this regime will also be relevant for some biological processes [5][6][7], since evolutionary pressure tends to force microscopic machines to act as efficiently as possible in thermal environments.
With this motivation in mind, based on the techniques and ideas of quantum information theory, an approach to smallscale thermodynamics has recently been developed  which has demonstrated [9,10] that the free energy F looses its role as the unique indicator of state transitions in the microscopic regime.Instead, a family of "α-free energies" F α determines the possibility of thermodynamic transformations: a transition is possible if and only if ∆F α ≤ 0 for all α > 0.
In the special case α = 1, we obtain the standard Helmholtz free energy, F 1 = F .This recovers the usual second law, ∆F ≤ 0, as a special case of an infinite family of "second laws".Moreover, the maximal amount of work that can be reliably extracted from a state ρ in contact with a heat bath is given by F 0 (ρ) + k B T log Z, while the minimal amount of work that one has to invest to prepare a state becomes F ∞ (ρ) + k B T log Z, with Z the partition function and k B Boltzmann's constant.In general, F 0 (ρ) < F (ρ) < F ∞ (ρ), which shows that thermodynamics looses an important reversibility property at the nanoscale: the amount of work needed to create a state exceeds the amount of work that can be extracted from that state.Intuitively, it is the appearance of fluctuations of the order of the free energy itself that is responsible for this effective irreversibility [29].It is only in the thermodynamic limit that all F α become effectively close to F = F 1 , which recovers standard macroscopic thermodynamics [10,27,30].
Yet, these recent results all rely on a specific assumption which is, as we will argue, unnecessary in many important physical situations.To understand this assumption, consider transforming a state ρ A of a physical system A to another state ρ A in the presence of a heat bath (see the caption of Figure 1 for more details).This is usually modelled by introducing another system -a thermal machine M , containing a "catalyst" σ M -such that via some suitable thermal operation.Crucially, the machine starts and ends in the same state σ M , which means that it is  [10].We have a system A that we would like to act on, by transforming its state ρA into another state ρ A .We have access to a heat bath with an arbitrary Hamiltonian HB, which is in its thermal state γB at some fixed temperature T .
The thermal machine contains a quantum system in state σM , and it controls a unitary transformation UAMB (symbolized by the pentagon), acting on the system A, heat bath B, and its internal system M .Crucially, this transformation is fully energy-preserving, i.e. [UAMB, HA + HM + HB] = 0.By tracing over the heat bath, we obtain the map σAM = TrB UAMB (ρA ⊗ σM ⊗ γB) U † AM B , which is, in total, a thermal operation, ρA ⊗ σM → σAM .We demand that the machine's internal state σM is exactly preserved (hence σM is often called a "catalyst": it enables the transformation, but is not consumed in the process), and we would like the resulting state of A, TrM σAM , to be identical to (or very close to) the desired target state ρ A .The difference to [10] is that we allow correlations to build up between A and M .If work is spent or extracted, we model this by an additional two-level system ("work bit") W which, initially as well as finally, is enforced to be exactly in an energy eigenstate (ground state |g or excited state |e ).This ensures that W remains uncorrelated with all other systems that are involved in this process, hence the resulting work ∆ can be reliably transferred to or from an external battery.retained in its original form and can be reused, which is essential for a thermodynamic cycle.But we see that, in addition to this crucial property, a further assumption is made: namely, that A and M end up in a product state and do not become correlated.
Arguably, there are many situations in which this additional assumption is unwarranted.For example, imagine a microscopic machine that acts on a myriad of small quantum systems, one after the other (say, a stream of particles), and builds up correlations with them while doing so.As long as the machine encounters every system only once, these correlations will not spoil the working of the machine on further systems.This motivates us to consider more general transformations of the form where the reduced final states are σ A = ρ A on A and σ M on M .That is, the machine's state becomes correlated with the system on which it has acted, but it is locally exactly preserved and can be used again on other systems on which it has not acted before.
Below, we will show that this setting surprisingly restores the standard second law: it is the Helmholtz free energy F that uniquely determines the possible state transitions.In particular, machines that act according to this more general prescription gain a significant advantage: they can essentially tame all fluctuations, and invest or extract the free energy difference with perfect reliability even when operating on single or strongly correlated quantum systems.In some cases, very small catalysts M can already lead to significant improvements of efficiency.
This result answers a major open question of [31] in the positive: Helmholtz free energy becomes the "unique criterion for the second law of thermodynamics".It is related to the insights of [32], but goes far beyond them: instead of correlating several auxiliary systems, here the machine becomes correlated with the system on which it acts (but remains otherwise intact), which is arguably a much more natural situation relevant to thermodynamics.The results of this paper also provide new insights into majorization theory, solving several open problems in that field, which may have further applications in entanglement theory [33].Namely, majorization determines the possibility of state interconversion for pure bipartite quantum states via local operations and classical communication [34], and standard catalysis is known to enhance the possible transitions [35].Since further thermodynamicsrelated concepts have recently been translated into this entanglement setting [33], we think that the results of the present paper may have interesting implications in this context too.Furthermore, in contrast to earlier results [36], the insights of this paper potentially continue to hold in the presence of quantum coherence (see the conjecture in Subsection II E).

A. Known results without correlation
We are working within a framework for thermodynamics that is motivated by quantum information theory.This framework formulates thermodynamics as a resource theory [27,28]: given any state of a physical system, together with a set of rules that constrain the agent's actions (e.g.global energy conservation), a resource theory asks for the ultimate limits of what is possible, e.g.how much work the agent can extract or what state transitions she can enforce.A sketch of the setup is given in Figure 1 (for now, ignore the "work bit" W ). We have a collection of quantum systems that each come with their own Hamiltonians.This includes a microscopic system A, typically out of equilibrium.We would like to transform its quantum state ρ A into another state ρ A , while possibly extracting or investing some work ∆ ≥ 0. This will be achieved with the help of a thermal machine, as explained in the caption of Figure 1.Crucially, all processes preserve the total energy exactly (not only its expectation value), and are performed in the presence of a heat bath at fixed temperature T .Microscopic reversibility is ensured by modelling global transformations as unitary operations.
As in most previous work (including [9] and [10]), we as-sume that the decoherence time is much smaller than the thermalization time.This amounts to assuming that all states are block-diagonal in energy (i.e.[ρ X , H X ] = 0 for all involved quantum systems X), which applies to a large variety of situations in physics, including ones traditionally studied in the context of Landauer erasure [37,38].In this semiclassical regime, the state of any system is characterized by the occupation probabilities of the different energy levels; the state is thermal if these probabilities are given by the Boltzmann distribution.It has recently been shown that coherence significantly complicates the picture [36,[39][40][41]; studying the semiclassical regime is therefore a crucial first step even if one is interested in the more general situation with coherence.We thus defer the treatment of quantum coherence to future work, but discuss some evidence that our main result could still hold in the presence of coherence in Subsection II E.
In order to account very carefully for all contributions of energy and entropy, we assume that the machine can strictly only perform the following operations: energy-preserving unitaries; accessing thermal states from the bath; and ignoring heat bath degrees of freedom by tracing over them.This results in a class of transformations called thermal operations which have the form stated in the caption of Figure 1.If we assume for the moment that there is no work reservoir W , and demand that these operations preserve the local state of the machine M and also its independence from A, then they describe transitions ρ A → ρ A as in equation ( 1).It has recently been shown [10] that a thermal transformation can achieve this transition (up to an arbitrarily small error on A) if and only if all α-free energies decrease in the process: Here To see that the α-free energies impose severe constraints on the workings of a thermal machine, let us look at a simple example.Suppose that a thermal machine is supposed to heat up a system A from its thermal state (of ambient temperature T ) to infinite temperature.If A is some twolevel system with energies 0 and E A , and the temperature is such that . The associated work cost will be delivered by an additional work bit W with energy gap ∆ > 0.
It starts in its excited state |e and will end up in its ground state |g .The machine tries to implement the transition A B P 8 G w J 6 9 F 6 s V 4 X r R l r O X M I P 2 C 9 f Q L y q 4 9 b < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 p G N s t N z L n h W X Z z g 5 e B U 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " J 4 7 j 2 0 5 J h 6 3 r I s i J g g / 4 Z 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 p G N s t N z L n h W X Z z g 5 e B U 0 A B P 8 G w J 6 9 F 6 s V 4 X r R l r O X M I P 2 C 9 f Q L y q 4 9 b < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 p G N s t N z L n h W X Z z g 5 e B U 0 A B P 8 G w J 6 9 F 6 s V 4 X r R l r O X M I P 2 C 9 f Q L y q 4 9 b < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 p G N s t N z L n h W X Z z g 5 e B U 0 A B P 8 G w J 6 9 F 6 s V 4 X r R l r O X M I P 2 C 9 f Q L y q 4 9 b < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 p G N s t N z L n h W X Z z g 5 e B U 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " J 4 7 j 2 0 5 J h 6 3 r Figure 2: Example of work cost scenario without allowing correlations to build up.A qubit A, initially in equilibrium, is supposed to be heated up to infinite temperature by spending some energy ∆ and by using a (potentially large) catalytic system M that remains uncorrelated with AW (and unchanged by the process).A transition of this form is only possible at work cost of at least ∆ .4kBT .
with a work cost ∆ that is as small as possible.As before, this is achieved by a catalytic thermal operation of the form (1) and Figure 2. What is the minimal amount of work needed, i.e. the smallest possible ∆?The α-free energy difference (see Appendix or [10] for the definition) between initial and final state of AW turns out to be which is increasing in α.Thus, this is ≤ 0 for all α if and only if ∆F ∞ ≤ 0, which becomes This is the ultimate limit for a transition as shown in Figure 2 to be successfully implementable.On the other hand, the standard free energy difference is ∆F/(k B T ) = log 3 − 3/2 log 2 − ∆/(k B T ), and for this to be ≤ 0 we must have Thus, textbook thermodynamic reasoning would suggest that .06k B T of energy should be sufficient for the state transition; however, our analysis has shown that the machine needs to spend considerably more work, namely .4k B T .As explained above, one reason for this is that we are dealing with the case of a single system only.The standard thermodynamic equations apply to large numbers of (independent, or weakly correlated) identical systems and their averages.That is, if is the energy (for example, energy gap of the work bit) that is needed to approximately achieve the transition < l a t e x i t s h a 1 _ b a s e 6 4 = " J 4 7 j 2 0 5 J h 6 3 r I s i J g g /

A W
< l a t e x i t s h a 1 _ b a s e 6 4 = " 0 p G N s t N z L n h W X Z z g 5 e B U 0 A B P 8 G w J 6 9 F 6 s V 4 X r R l r O X M I P 2 C 9 f Q L y q 4 9 b < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 p G N s t N z L n h W X Z z g 5 e B U 0 A B P 8 G w J 6 9 F 6 s V 4 X r R l r O X M I P 2 C 9 f Q L y q 4 9 b < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 p G N s t N z L n h W X Z z g 5 e B U 0 A B P 8 G w J 6 9 F 6 s V 4 X r R l r O X M I P 2 C 9 f Q L y q 4 9 b < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 p G N s t N z L n h W X Z z g 5 e B U 0    2), even though the catalyst M consists only of a single qubit.then ∆ (n) ≈ n∆F (here .06nk B T ) as n becomes large (up to corrections that are sublinear in n), as shown, for example, in [27,44].Intuitively, by acting collectively on a large number of particles, a machine can achieve more than if it had to act on each particle separately.This phenomenon is again related to versions of the law of large numbers, which results in quantities becoming sharply peaked around their averages in large ensembles.This is bad news for the machine -what if it is essential for the given physical setup that the specific single instance of A is being heated, and that very little work is spent in this process?A glance at Figure 2 can guide us towards a solution: whatever transition we have there, it must come from a thermal operation that is being performed globally on the M AW system.While doing so, the thermal machine better takes care of preserving the state of M so that it can be reused in the future.But the way we have formulated catalytic thermal operations so far introduces yet another complication for the working of the machine: it must keep M uncorrelated from AW .This seems hard and overly constraining, given that interaction typically creates correlation.
We thus have two independent motivations to allow correlations between M and AW : the difficulty to avoid correlations on interaction, and the desire to achieve higher efficiency.We will now show that the latter goal can indeed be achieved by allowing correlations to build up, even if the catalyst M is as small as a single qubit.Suppose that M has a trivial Hamiltonian, H M = 0, and two basis states |0 and |1 (both of energy zero).Denote ground and excited state of A by |g A and |e A , and consider the correlated state By computing the partial trace, we find that ρ A is indeed the infinite-temperature state, and which will also be our local qubit catalyst state σ M .Thus, if we enforce thermal transitions of the form then A will be heated up, the local reduced state of M will be preserved, and correlations will build up between A and M (note that there cannot be any correlations with W since it is in a pure state).Now, as we show in Appendix III D, this transition can be achieved by a thermal operation (without the need for any additional "standard" catalysts), investing only of work.That is, the single qubit catalyst allows us to save about 1/3 of the total work cost as compared to (4).One can easily imagine situations in which this represents a decisive physical advantage.
In the remainder of the paper, we will explore the ultimate limitations of this kind of "correlating" catalysis.We will show that these limitations are uniquely determined by Helmholtz free energy.That is, by using other suitable catalysts in the example above, one can get as close to ∆ = ∆F ≈ .06kB T as one wishes (but not below), at the prize of having a possibly large catalyst at hand (which can however be reused).

C. Correlating state transformations in general
Under what conditions can a state transition as in the example above be achieved?For the moment, let us assume that there is no work bit W (we will reintroduce W in the next subsection).In order to implement the transition (2) with a thermal operation, it is still necessary that ∆F α ≤ 0 on the joint system AM for all α, since this is a necessary condition for all thermal operations.In the uncorrelated case, eq. ( 1), the same inequalities follow for system A alone, since F α (ρ A ⊗ σ M ) is simply the sum F α (ρ A )+F α (σ M ).But in the correlated case, the situation is different.In this case, it turns out that there are two special values of α, namely α = 0 and α = 1, for which F α has the important property of superadditivity: that is, This allows us to obtain two conditions on the state of A alone, starting with the non-increase of F α on AM : Thus, we conclude that But the other F α are not in general superadditive, as emphasized in [31,32,62], see also [66,69].Hence we cannot draw an analogous conclusion for the other α-free energies.Moreover, the condition F 0 (ρ A ) − F 0 (ρ A ) ≤ 0 is arguably physically irrelevant for the purpose of this subsection, as a glance at its definition shows: we have (the "min-free energy" from [9]), where S 0 (ρ A γ A ) = − log tr(π ρ A γ A ) is the "min-relative entropy" from quantum information theory [43], with π ρ A the projector onto the support of ρ A .This is a discontinuous quantity which takes its minimal value whenever the state has full rank, i.e. no energy level has probability zero.Since there is no essential physical difference between zero population and extremely small nonzero population, we can ensure that the target state ρ A has full rank by allowing an arbitrarily small error in the transition.Thus, only the standard Helmholtz free energy condition ∆F 1 ≡ ∆F ≤ 0 survives as a relevant necessary condition for a correlating state transition.But is it also sufficientthat is, given that it is satisfied, can we in principle always engineer the machine M and its state such that transition (2) is possible?This was conjectured in Ref. [31], and our first main result shows that this is indeed the case: Main Result 1.Consider some initial state ρ A and target state ρ A , both block-diagonal in energy.In the setting of Figure 1 (without work bit W ), the transition with σ A := Tr M σ AM arbitrarily close to ρ A , can be achieved by a thermal operation if and only if F (ρ A ) ≥ F (ρ A ), with F the Helmholtz free energy.Note that the state σ M of the thermal machine M is exactly identical before and after the transformation, and its state space is finite-dimensional.
Furthermore, the Hamiltonian on M can be chosen as H M = 0, and the final correlation between A and M , as measured by the mutual information I(A : M ) σ , can be made arbitrarily small (but not in general zero).
The proof is sketched in Subsection II F, and given in full detail in the Appendix.As in earlier work, the catalyst σ M will in general depend on the initial and final states ρ A , ρ A and on the Hamiltonian H A ; it will also depend on the amount of correlation I(A : M ) σ that the agent is willing to allow to build up.Therefore, we should think of the thermal machine in Figure 1 as containing a large collection of different catalysts σ M .Depending on the situation, the machine will apply the corresponding suitable catalyst.Doesn't the agent have to "know the system state ρ A " to apply her machine accordingly?The answer to this question is that the state ρ A is supposed to model the agent's knowledge of the system A in the first place, and this interpretation is chosen implicitly in most works in the present context.For example, the energy cost in Landauer erasure [37,38] is not necessarily relying on an objective "delocalization" of a particle in two halves of a box, but is simply due to the agent's missing knowledge about whether it will be detected in the left or the right half in any single experimental run.Consequently, the agent can always choose the catalyst that suits her knowledge of the system as encoded in her state description.
What can we say about the size of the catalyst σ M ?As we have shown by example in Subsection II B, in some cases the catalyst can be as small as a qubit and still allow for substantial advantages as compared to the standard "non-correlating" notion of catalysis.Main Result 1 formalizes the ultimate possibilities and limitations of thermal machines acting on single small quantum systems, without aiming at the use of "realistic" catalysts.Thus, in the proof, we will take advantage of constructing "custom-tailored" catalysts that can generically be very large.This is not different, however, from the case of standard catalysis [48,49].We leave the problem of finding efficient implementations of the catalysts for future work.

D. Correlating work cost in general
We now consider the more general situation that we have an additional work reservoir, containing some energy ∆ ≥ 0 that we may spend in addition to achieve the state transition.As depicted in Figure 1, this is modelled by a "work bit" W , a two-level system with energy gap ∆, that will transition from its excited state |e to its ground state |g during this process.An example has been discussed in Subsection II B above.
We imagine that this work bit is part of a larger "ladder" of energy levels which we can charge or discharge like a battery in between thermodynamic cycles.It is therefore crucial to demand that the work bit W does not become correlated with the other parts of the machine M .One way to ensure this is to demand that W is always exactly, and not just approximately, in an energy eigenstate.It turns out that we can always achieve this behavior: Main Result 2. Consider some initial state ρ A and target state ρ A , both block-diagonal, such that F (ρ A ) ≥ F (ρ A ). Using a work bit W with some energy gap ∆ larger than, but arbitrarily close to F (ρ A ) − F (ρ A ), the transition can be achieved by a thermal operation, where σ A := Tr M σ AM is arbitrarily close to ρ A .
Similarly as in Main Result 1, the state σ M is exactly identical before and after the transformation, M is finite-dimensional, and the resulting correlations between A and M can be made arbitrarily small.
The method to engineer this transition is very similar to that of Main Result 1, except for one important difference: since we are interested in producing a pure state |g exactly, we have to make sure that the min-free energy F 0 , which depends only on the rank of the state, is non-increasing in the process.But this holds automatically because if ∆ > 0. Thus, the min-free energy introduces no new constraints in the case that we use work to form a state ρ A .The "correlating work cost" is given by the Helmholtz free energy difference F (ρ A ) − F (ρ A ).

E. Correlating work extraction, and an open problem
Consider the converse situation: given an initial state ρ A and a target state ρ A such that F (ρ A ) ≥ F (ρ A ), we would like to extract work by transforming a work bit from its ground state |g g| W to its excited state |e e| W .Here we encounter a problem: since ρ A will in general have full rank, the work bit alone lower-bounds the min-free energy difference of the corresponding transition, namely ∆F 0 = F 0 (|e e| W ) − F 0 (|g g| W ), and this is a positive amount if the energy gap ∆ is positive.
Thus, unfortunately, the min-free energy condition ∆F 0 ≤ 0 forbids this transition.If we still insist on producing the excited state exactly (for the reasons explained in Subsection II D), we need an additional resource: namely, a sink S for the corresponding entropy S 0 (ρ) = log rank(ρ), the "max entropy".A max entropy sink S carries a trivial Hamiltonian, H S = 0, such that S 0 (ρ S γ S ) = log d S − S 0 (ρ S ), where d S is the Hilbert space dimension of S. Thus, we can extract min-free energy by dumping max entropy S 0 into S, which can be achieved by increasing the rank of the state of S. For example, if S carries a state τ (for some small ε > 0) with eigenvalues this extracts min-free energy ∆F 0 = k B T log(n/m) from S. Since ε > 0 can be arbitrarily close to zero, and ∆F 0 does not depend on ε, this changes the physical state of S by an arbitrarily small amount.Thus, we obtain the following: Main Result 3. Consider some initial state ρ A and target state ρ A , both block-diagonal, such that F (ρ A ) ≥ F (ρ A ). Using a work bit with energy gap ∆ smaller than, but arbitrarily close to F (ρ A ) − F (ρ A ), we can implement the following transition with a thermal operation, which extracts work ∆ without any fluctuations: Here σ M = Tr AS σ AM S remains identical during the transformation, σ S = τ (m,n,ε) S , and σ A is as close to ρ A as we like.This can be achieved for any choice of ε > 0, as long as n/m is large enough.
Since the state of the max entropy sink S remains almost unchanged, the agent may measure the state of the sink after the transition, by checking whether its configuration is one of the (n − m) basis states which have probability zero in the initial state τ (m,n) S .With probability 1 − ε, this will yield the answer "no" and restore the original state τ (m,n) S due to state updating.However, even if ε > 0 is very small, a large number of repetitions of the thermodynamic cycle will eventually lead to failure of the protocol.
In other words, the case of work extraction suffers from a deficit that is not present in the case of formation of a state: it admits only a weaker notion of cyclicity.An additional max entropy sink is needed, and its state is not reset with unit probability after every cycle.It is well-known that allowing small deviations from cyclicity can lead to quite implausible and unphysical effects like embezzling of work [10,70].Thus, we consider Main Result 3 as only a preliminary answer to the question of the ultimate limits of work extraction in the setup of this paper.Note that the authors of [10] use a similar construction to dismiss the F α -conditions for α < 0.
The main source of the problem is to insist on producing the excited state |e exactly.If we allow that this state is only obtained approximately, and possibly correlated with the system M , then we obtain a valid alternative to Main Result 3 without any max entropy sink (simply by applying Main Result 1).The problem is that correlations between W and M may potentially compromise the working of the machine in further cycles.This leads to the question whether it can be ensured that W remains uncorrelated with all other systems even if we drop the condition that it is in an exact eigenstate: Open Problem.Can we formulate a suitable version of Main Result 3 which allows the state of the work bit to be slightly mixed (dropping the max entropy sink), but which ensures nevertheless that it remains perfectly uncorrelated with all other systems (in particular M )?
This should be achieved in a way that allows to accumulate work over many extraction cycles without degrading its "quality" (fidelity with an eigenstate) and without the need for increasing resources or precision.
We conjecture that the answer is "yes", and that it will lead to the same expression for the amount of work that can be extracted in the correlating scenario of this paper as suggested by Main Result 3, namely F (ρ A ) − F (ρ A ).A possible approach could be to adopt the methods of [72], and to consider quasistatic "near perfect" work extraction.
The authors of Ref. [73] have recently shown that work can be extracted from passive states if the thermal machine M is allowed to become correlated with the system.However, only work extraction on average was considered (not fluctuationfree single-shot work extraction like in this paper), the extracted work was only modelled implicitly, without the demand that unitaries preserve the total energy, and no heat bath (and thus background temperature) was considered.Thus the Helmholtz free energy F plays no role in [73].

F. Sketch of proof
Before discussing the role of coherence in Subsection II G below, we now give a self-contained sketch of the proof of the main results.It is mostly based on majorization theory and can be skipped by readers who are only interested in the physical discussion.All proof details can be found in the appendix.Given any quantum system X (which may itself be composed of several quantum systems), a thermal operation on X is a map ρ X → ρ X such that there exists a finite-dimensional system B with where [U XB , H X + H B ] = 0 for H X and H B the Hamiltonians of X and B, and γ B = exp(−βH B )/Z is the Gibbs state, with β = 1/(k B T ) and Z the partition function such that tr γ B = 1 (the temperature T is arbitrary but fixed).Our main results claim that certain state transitions on composite systems are or are not possible via thermal operations.We make use of two technical simplifications to prove these results.
First, since we are only considering states that are blockdiagonal in energy eigenbasis (except for Subsection II G), we can represent quantum states ρ X as probability vectors, p X ∈ R m , where m = dim X is the dimension of X's Hilbert space, and the entries of p X are the occupation probabilities of the (ordered) energy levels.A Hamiltonian H X can then be represented as a vector H X = (E 1 , . . ., E m ) with energies E i , and it is for many purposes sufficient to consider only unitaries U which correspond to permutations of entries of the probability vector, chosen such that H X is left invariant.See [28] and [46] for mathematical details.
Second, there is a well-known technique to reduce the study of (block-diagonal) thermal operations to the case where all Hamiltonians of all involved physical systems Y are trivial, H Y = 0.This is achieved via an "embedding map" Γ which, intuitively, reformulates the canonical state on some space as a microcanonical state on another space.This technique has been introduced in [10] and used e.g. in [32] and [36] (the latter reference contains a summary in its Methods section).
In this simplified situation of trivial Hamiltonians and block-diagonal states, it can be shown that a state p X on some system X can be transformed into another state p X to arbitrary accuracy by a thermal operation if and only if p X majorizes [45,65] where p ↓ = (p ↓ 1 , . . ., p ↓ m ) denotes the reordering of p in nonincreasing order, i.e. p ↓ i = p π(i) for some permutation π such that p ↓ 1 ≥ p ↓ 2 ≥ . . .≥ p ↓ m .This prescription does not yet take into account the possibility of having an additional catalyst c M as in Figure 1.Demanding, as in Subsections II A and II B, that the catalyst remains uncorrelated with the system, we are led to the question under what conditions there exists some probability vector c M such that This question has been answered in [48] and [49]: suppose that p ↓ X = p ↓ X and at least one of them does not contain zeros.Then there exists some state c M such that (5) holds if and only if H α (p) < H α (p ) for all α ∈ R \ {0}, and H Burg (p) < H Burg (p ), where the Rényi entropies H α [67] and the Burg entropy H Burg [68] are defined as Inverting the embedding Γ, allowing arbitrarily small errors in the production of the target state, and investing a tiny amount of extra work [10] leads to condition (3) for thermal transitions of the form (1), i.e. ∆F α ≤ 0 for all α-free energies with α > 0.
The crucial step for establishing Main Results 1-3 is the following theorem that we prove in detail in the Appendix: Main Theorem.Let p, p ∈ R m be probability distributions with p ↓ = p ↓ .Then there exists an extension p XY of p ≡ p X such that if and only if H 0 (p) ≤ H 0 (p ) and H(p) < H(p ).Moreover, for every ε > 0, we can choose Y and p XY such that the mutual information is The statement of this theorem uses the max entropy (or Hartley entropy) H 0 (p) = log #{i | p i = 0}, with its quantum version (also used in the main text) S 0 (ρ) = log rank(ρ), and it uses the notion of an "extension" of a probability distribution p .To this end, we label the system on which p lives by X, and introduce another (discrete) system Y .An extension of p is then a joint probability distribution p XY on the composite system XY such that its marginal on X equals p X .The mutual information I(• : •) and relative entropy S(• •) are defined in the Appendix.An interesting consequence is that, due to the Pinsker inequality [50], Results for trivial Hamiltonian explained in the main text, we will in the following consider a particular family of bipartite probability distributions.For ven probability distribution q ⌘ q A = (q 1 , . . ., q m ) 2 R m with q i 6 = 0 for all i, we define the extension n 2 N and 0 < < 1 2 min i q i .This is an m ⇥ (n 2 + n + 1)-matrix with strictly positive entries which defines a joint bility distribution on AB.By summing over the rows, it is easy to see that it has q as its marginal on A. Its marginal on B ect computation, it turns out that the mutual information in q AB is independent of n: e have in particular lim &0 I(A : B) = 0. a 1.Let p, q 2 R m be probability distributions with full rank such that H(p) < H(q).Then, for every " > 0, there exist with < 1 2 min i q i and n 2 N such that q AB as defined in (6) satisfies Figure 4: The extension p XY 1 of p X that is used in the main text to establish sufficiency of the entropy conditions in the Main Theorem.
According to (7), the goal is to build an extension such that the blue curve (that is, the α-Rényi entropy balance) attains only positive values.The plot is for m = 3, δ = 10 −3 , p = pX = ( 91 100 , 1 20 , 1 25 ), Y1 = R n 2 +n+1 with n = 10 15 and p = p X = ( 17 20 , 7 50 , 1 100 ).Since Hα(p) > Hα(p ) for 0 < α ≤ 1 3 , there does not exist cM such that (5) holds true, i.e. no standard catalytic thermal operation can transform p into p .Nevertheless, the transition can be achieved by a correlating catalytic thermal operation.The shaded colors show how different entries of p XY 1 are responsible for (the positivity of) different parts of the curve, as explained in the main text.In the limit n → ∞, only positivity at α = 1, i.e. positive balance of Shannon entropy, remains as a necessary condition.
Using the subadditivity [66] of H 0 and H = H 1 , it is very easy to see that H i (p) ≤ H i (p ) for i = 0, 1 is necessary for the existence of some p XY which satisfies (6).The hard part is to show that it is sufficient.To show this, we construct an explicit extension p XY of p X that satisfies (6).This is done in two steps.First, we introduce an auxiliary system Y 1 and an extension p XY1 of p X such that The results of [48,49] explained above will then guarantee that there is yet another auxiliary system Y 2 with a probability distribution c Y2 such that and we can simply define Y := Y 1 Y 2 and p XY := p XY1 ⊗c Y2 .The extension p XY1 is explicitly defined in Figure 4.While we can represent probability distributions p X on a system X as vectors p = (p 1 , . . ., p m ) ∈ R m , we can similarly represent bipartite probability distributions p XY1 as matrices p ij , like we do for p XY1 in Figure 4. Summing over the rows resp.columns gives the marginals p X = (p 1 , . . ., p m ) and p Y1 , which shows in particular that p XY1 is indeed an extension of p X .We choose Y 1 to be (n 2 + n + 1)-dimensional, whereas X is m-dimensional.
Let us consider the special case that p X does not contain zeros (implying H 0 (p) ≤ H 0 (p )) and that p X = ( 1 m , . . ., 1 m ).Suppose that H(p) < H(p ).We claim that for all α = 1, which can be seen in Figure 4 by the fact that the left-hand side (the blue curve) approaches the right-hand side (the black dashed curve) for large n.In fact, the blue curve is monotonically increasing in n towards the black curve.Since the maximal value of H α (p X ) is log m, and this is only attained at the uniform distribution, this shows that the blue curve attains strictly positive values away from α ∈ {0, 1} if n is large enough.According to the first condition in (7), this is exactly what we need to achieve.
We can understand why this happens by considering the different intervals of α separately.It turns out that the Rényi entropies H α in the regime α > 1 are dominated by the largest elements of a probability distribution, which, in this case, are the δ-entries (shaded yellow); all other entries do not contribute much to the value of H α .Since those entries are all equal, the expression H α (p XY1 ) − H α (p Y1 ) reduces in the limit n → ∞ to log m.On the other hand, for α < 1, is is the smallest entries of the probability distributions that matter, which are the (δ/n 2 )-entries (shaded grey), leading to the same conclusion.In fact, this intuition has been used in quantum information theory in the construction of counterexamples to certain versions of the so-called additivity conjecture [52][53][54].
In contrast, for α = 1, the difference of entropies is constant in n and satisfies This explains why the blue curve in Figure 4 has an nindependent "dip" at α = 1: the value there differs in the limit from those at α < 1 and α > 1.Thus, the dip becomes very narrow as n tends to infinity.The blue curve takes values at α = 1 which are in the limit positive and independent of the target state p X and its extension p XY1 ; it is only at α = 1 where the value depends on that state and its extension.If we choose δ small enough, we can enforce that the blue curve remains positive also around α = 1 if and only if H(p ) − H(p) > 0 -that is, positivity of the standard Shannon entropy difference survives as the unique condition.One can show that the Burg entropy is related to the derivative of the blue curve at α = 0, and the second condition in ( 7) is automatically satisfied too, which establishes the first part of the Main Theorem.All remaining details of the proof are given in the Appendix.
Main Result 1 is then established by using an inverse of the embedding map Γ, as explained above.The proofs of Main Results 2 and 3 are very similar, except that some care has to be taken that all approximations (which are unavoidable due to the construction of Γ [10]) are chosen without spoiling the purity of the work bit W .These results have thus independent (but very similar) proofs.
As we also show in the Appendix, a simple consequence of the result above is a resolution of an open problem in [42]: in the notation of that paper, it follows that c-trumping for k = 2 is equivalent to c-trumping for k ≥ 3.
Theorem 4 (cf.Appendix).Let p, q ∈ R m be probability distributions with p = q.Then there exist auxiliary systems B, C and a bipartite distribution r BC such that if and only if H 0 (p) ≤ H 0 (q) and H(p) < H(q).Here, r B and r C denote the marginals of r BC .
This also shows that k = 2 systems are enough to use stochastic independence as a resource as described in [32], not only k ≥ 3. We briefly comment on the relation between the present work and [42] after Theorem 4 in the Appendix.

G. Correlation and coherence?
So far, our discussion has focused on block-diagonal states, i.e. states that commute with the total Hamiltonian.In quantum thermodynamics, it is standard to consider this situation first, since transitions between states with coherence are much harder to characterize [36,39,40].In fact, the generic situation is that classification results for block-diagonal states fail to hold in the presence of coherence [74], such as the equivalence of Gibbs-preserving and thermal operations [22].
It is thus remarkable that the result of this paper has potentially a chance to hold in the presence of coherence as well: Conjecture.Main Result 1 remains true also in the case that ρ A and/or ρ A are not block-diagonal, i.e. in the presence of quantum coherence.
At first sight this may seem implausible: if, for example, ρ A = σ A is a pure state, σ AM must be a product state, and so the transition in Main Result 1 will simplify to which is just a standard catalytic thermal transition as discussed in Subsection II A, subject to the family of "second laws" ∆F α ≤ 0 (not just ∆F ≤ 0).But this ignores that we are in general only interested in producing the target state ρ A approximately (though to arbitrary accurary), such that σ A ≈ ρ A may in general still be a mixed state, undermining the above counterargument.If ρ A is incoherent and ρ A is not, then a simple argument shows that transitions of the form (8) are impossible.Following [47], define the quantum Fisher information for a system with Hamiltonian H and state ρ as I(ρ, H) := tr( ρ∆ −1 ρ ρ), where ρ := i[ρ, H] and ∆ ρ X := (ρX + Xρ)/2.Then I(ρ, H) = 0 if and only if ρ is incoherent.Moreover, I is additive on tensor products, and ρ → σ by a thermal operation implies I(ρ) ≥ I(σ), since thermal operations are covariant.Applying these properties to (8) tells us that I(ρ A ) ≥ I(ρ A ), i.e. if ρ A is block-diagonal then so is ρ A .
However, this kind of reasoning cannot be used to rule out Main Result 1: in general, it may hold I(σ A ) + I(σ M ) > I(σ AM ), and in this sense, correlations can increase the total amount of coherence as summed over all subsystems.This phenomenon is also at the heart of Åberg's result [57] which gives us further evidence for the conjecture above.While Åberg's setting is different from the one in this paper (his catalyst changes its state during every operation, and, in particular, is infinite-dimensional, thus exceeding the strict notion of cyclicity that we have adopted here -similar comments apply to the improved results by Korzekwa et al. [41]), his setup allows to "broadcast" coherence in some sense indefinitely catalytically, while correlating the catalyst with the systems on which it acts, pretty much in the same way as in this paper.It has been noted that this comes at the prize of correlating the systems on which the catalyst successively acts [58].Therefore, the conjecture above blends into a series of questions about how to best use coherence catalytically [59].We leave the resolution of this conjecture to future work.

III. CONCLUSIONS
It has been argued in [10] that the Helmholtz free energy loses its role as the unique indicator of state transitions in small-scale thermodynamics.Instead, an infinite family of "α-free energies" takes its place.It has been noted that this implies in particular that there is an inherent irreversibility at the nanoscale: while it takes F ∞ (ρ) + k B T log Z to create a state ρ, only work F 0 (ρ) + k B T log Z can be extracted if one is given one copy of ρ, where in general F 0 < F ∞ .But these results have been obtained under the assumption that the corresponding thermal machine remains uncorrelated from the systems on which it acts.In this paper, we have argued that this restriction can be lifted in many situations, and we have shown that this restores the distinguished role of the Helmholtz free energy F .Moreover, work extraction and formation at the free energy difference can be achieved without any fluctuations, up to a minor tweak in the extraction case.
Does this mean that we have restored reversibility at the nanoscale?Not quite.An interesting perspective to take is that this irreversibility has simply been shifted, from work to correlations.That is, while work cost and extractable work are now both equal to F (ρ) (up to the Open Problem of Subsection II E), a new form of irreversibility has appeared: namely, initially uncorrelated systems (for example, A and M ) become correlated.It is interesting to see that this brings us closer to discussions of the founding days of thermodynamics: Boltzmann's H-theorem [71], for example, derives the non-decrease of entropy in a gas from the assumption that the velocities of molecules are initially uncorrelated (i.e.factorize), but they become correlated after a collision ("Stoßzahlansatz").This introduces naturally an "arrow of time", and the fluctuation-free single-shot work formation and extraction in the present paper comes at the prize of introducing an analogous "aging" to the physical systems, with "wrinkles" given by correlations.
We emphasize that the results of this paper are not primar-ily meant as a criticism of earlier work.The point is not that it would be "wrong" to demand that the catalyst is returned uncorrelated (as in (8)), but rather that the thermodynamic task of state conversion, when considered at the nanoscale, comes in two different versions: one version applicable to situations in which the machine acts on the same system multiple times, such that the catalyst must be returned uncorrelated; and a second version, in which the machine acts on many different quantum systems individually (and on each only once), in which case correlations are allowed to persist.In this paper, any "probability distribution" (or just "distribution") is assumed to be discrete, i.e. is a vector p ∈ R m for some m ∈ N such that p = (p 1 , . . ., p m ), p i ≥ 0, m i=1 p i = 1.We interpret it as a probability distribution on the discrete sample space {1, 2, . . ., m}, and we will usually denote the corresponding probability space by an uppercase letter like A, following quantum information terminology, writing p ≡ p A .Given two probability spaces ("systems") A and B, we can consider the composite probability space AB with a sample space that is the direct product of the two sample spaces.Independent product distributions will then be represented by vectors p A ⊗ q B , and we can write joint probability distributions q ≡ q AB in matrix form, by collecting the probabilities q AB (i, j) into a table.Summing over the rows resp.columns of this matrix will give the marginal distributions on A resp.B. We will sometimes slightly abuse notation and use uppercase letters like A also as placeholders for the vector space R m that contains its probability distributions, writing for example p ∈ A instead of p ∈ R m .This improves clarity in cases where there is more than one system with sample space {1, 2, . . ., m}.Moreover, probability distributions will sometimes be called "states", again following quantum information terminology.
We define the notions of majorization [45] and α-Rényi entropies H α as well as Burg entropy H Burg as described in Subsection II F. A stochastic map is a linear map Λ : A → A that maps probability distributions to probability distributions.A stochastic map is bistochastic if it preserves the uniform distribution µ = ( 1 m , . . ., 1 m ) ∈ R m , i.e.Λ(µ) = µ.It is well-known that p q is equivalent to the existence of a bistochastic map Λ such that Λ(p) = q [45].Following [34,35], we say that a distribution p A trumps another distribution q A , denoted p T q, if there exists another (finite discrete) system B and a distribution c B such that As explained in Subsection II F, the relation p T q for p ↓ = q ↓ is equivalent to H α (p) < H α (q) for all α ∈ R \ {0} and H Burg (p) < H Burg (q), which was proven in [48,49].
We use the trace norm (or trace distance [51]) Stochastic maps Λ do not increase the trace norm, i.e.Λ(a) ≤ a for all a ∈ R m .Following [10] (see also [61]) we define the Rényi divergences, or relative Rényi entropies, for distributions p, q ∈ R m as where We will always assume that there is a fixed "background inverse temperature" β > 0, and we will use the definition k B T := 1/β, where we interpret T as the temperature and k B as the Boltzmann constant.The α-free energies F α are defined as [10] where p ∈ A is any state, Z ≡ Z A = m i=1 exp(−βE i ) is the partition function with H A = (E 1 , . . .E m ) the Hamiltonian (which, as described in Subsection II F, is now a vector with the energy levels as entries), and γ = (γ 1 , . . ., γ m ) with γ i = exp(−βE i )/Z the thermal state (or Gibbs state).
Recall the definition of a thermal operation in Figure 1, but in the special case that the system M is trivial, i.e.AM = A. If all states are block-diagonal, then we have a "classical" version of a thermal operation, acting effectively on classical probability distributions.If p, q ∈ A are probability distributions, we can ask under what conditions a thermal operation can map the quantum state diag(p) to diag(q).This question was answered in [44], see also [46,63,64]: this transition is possible to arbitrary accuracy if and only if there exists a stochastic map Λ with Λ(p) = q and Λ(γ A ) = γ A (actually, in many but not all cases, the target state q can be produced exactly by a thermal operation, i.e. with perfect accuracy, as discussed in [46]).Therefore, the existence of a thermal operation that maps one block-diagonal state to another can be shown by constructing a corresponding "Gibbs-preserving" stochastic map which maps the initial to the final distribution.The main result of [10] was to give a criterion for the existence of a stochastic map Λ with the above properties: basically (for details see [10]), F α (p) ≥ F α (q) for all α is sufficient and necessary for the existence of such a map (we will not use this result directly in what follows).

B. Results for trivial Hamiltonian
As explained in the main text, we will in the following consider a particular family of bipartite probability distributions.For any given probability distribution q ≡ q A = (q 1 , . . ., q m ) ∈ R m with q i = 0 for all i, we define the extension where n ∈ N and 0 < δ < 1 2 min i q i .This is an m × (n 2 + n + 1)-matrix with strictly positive entries which defines a joint probability distribution on AB.Summing over the rows shows that it has q as its marginal on A. Its marginal on B is By direct computation, it turns out that the mutual information in q AB is independent of n: and we have in particular lim δ 0 I(A : B) = 0.
Lemma 1.Let p, q ∈ R m be probability distributions with full rank such that H(p) < H(q).Then, for every ε > 0, there exist δ > 0 with δ < 1 2 min i q i and n ∈ N such that q AB as defined in (9) satisfies p A ⊗ q B T q AB as well as I(A : B) ≡ S(q AB q A ⊗ q B ) < ε.
Proof.For α ∈ R ∪ {−∞, +∞}, define the entropy difference We claim that ∆ n is everywhere continuous in α.By definition this is true for all α = 0; for α = 0, it follows from the fact that p and q both have full rank that lim α 0 ∆ Defining η(x) := −x log x for x = 0 and η(0) := 0, we get All n-dependence miraculously cancels out, and we have lim δ 0 ∆ (1)  n = H(q) − H(p) > 0.
By continuity, positivity of ∆ n is ensured if δ is small enough.Furthermore, due to (10), if δ is small enough, we will also have I(A : B) < ε (note that I(A : B) is in particular independent of n).We thus choose some δ ∈ 0, 1 m small enough for both and keep it fixed in all that follows.Consequently, ∆ n is constant in n and positive, and 0 < δ < 1 m .For finite α ∈ {0, 1}, we get We claim that this expression is increasing in n, for every non-zero α ∈ R ∪ {−∞, ∞}.We have already shown this for α = 1, and now we will show it for all other α ∈ {0, 1} by considering the following cases: n is increasing in n if and only if the fraction on the right-hand side of ( 11) is decreasing in x := n 1−α .In other words, we have a function and we have to show that it is decreasing in x; note that we are only interested in x ≥ 1, since n 1−α ≥ n ≥ 1.To this end, we can simply look at the derivative and we see that it only remains to be shown that m α m i=1 (q i − 2δ) α ≥ m(1 − 2mδ) α .Let r i := (q i − 2δ)/(1 − 2mδ), then r = (r 1 , . . ., r m ) is a probability distribution, and H α (r) ≤ − log m, which implies m i=1 r α i ≥ m 1−α , and so which shows that f (x) ≤ 0 in the relevant interval for x, and we are done.
• If 0 < α < 1, we can argue similarly, except that now the function f in (12) has to be increasing in x = n 1−α .We can argue via the derivative exactly as above, but now H α (r) ≤ log m, hence m i=1 r α i ≤ m 1−α , and therefore m α m i=1 (q i − 2δ) α ≤ m(1 − 2mδ) α , which gives us the opposite sign, f (x) ≥ 0, as desired.• If α > 1, then the function f in (12) also has to be increasing in x = n 1−α , but since 1−α < 0, we are now only interested in the interval 0 < x < 1.On the one hand, we now have m i=1 r α i ≥ m 1−α , which implies m α m i=1 (q i − 2δ) α ≥ m(1 − 2mδ) α , but on the other hand, the factor (x 2 − 1) in the derivative becomes negative, hence f (x) ≥ 0.
• By continuity, ∆ Proof.While p does not necessarily have full rank, the distribution p (κ) ∈ R m does (for every 0 < κ < 1), where p (κ) i := (1 − κ)p i + κ/m.Since H(p) < H(q) and lim κ 0 H(p (κ) ) = H(p), there exists some κ > 0 (smaller than one) such that H(p (κ) ) < H(q).Thus, we can apply Lemma 1 and get that there exists a system C of suitable dimension and an extension q AC of q = q A such that p (κ) A ⊗ q C T q AC and S(q AC q A ⊗ q C ) < ε.But p p (κ) , hence p A ⊗ q C p (κ) A ⊗ q C , therefore p A ⊗ q C T p (κ) A ⊗ q C .Since the trumping relation is transitive, it follows that p A ⊗ q C T q AC .By definition of trumping, there exists yet another system D of suitable dimension and a distribution r D such that p A ⊗ q C ⊗ r D q AC ⊗ r D .Now we define B to be the joint system CD, and q AB := q AC ⊗ r D , then q B = q C ⊗ r D , and we have p A ⊗ q B q AB .Furthermore, S(q AB q A ⊗ q B ) = S(q AC ⊗ r D q A ⊗ q C ⊗ r D ) = S(q AC q A ⊗ q C ) < ε.
This completes the proof.This allows us to prove the main theorem of Subsection II F: Theorem 3. Let p, q ∈ R m be probability distributions with p ↓ = q ↓ .Then there exists an extension q AB of q = q A such that p A ⊗ q B q AB if and only if H 0 (p) ≤ H 0 (q) and H(p) < H(q).Moreover, if these inequalities are satisfied, we can always choose B and q AB such that I(A : B) ≡ S(q AB q A ⊗ q B ) < ε, for any choice of ε > 0.
Proof."Only if" part.If p = q and p A ⊗ q B q AB , then we get due to additivity, subadditivity and Schur concavity of H α for α ∈ {0, 1} H α (p A ) + H α (q B ) = H α (p A ⊗ q B ) ≤ H α (q AB ) ≤ H α (q A ) + H α (q B ), thus H α (p) ≤ H α (q).This shows that H 0 (p) ≤ H 0 (q).Now consider the α = 1 case.While we also get H(p) ≤ H(q), equality (i.e.H(p) = H(q)) would entail that H(q AB ) = H(q A ) + H(q B ), which is only possible if q AB = q A ⊗ q B .But this would give us p A ⊗ q B q A ⊗ q B , or p A T q A for p = q, which implies that H(p) < H(q)."If" part.We may assume without loss of generality that the entries of p and q are sorted in non-increasing order, i.e. p 1 ≥ p 2 ≥ . . .and q 1 ≥ q 2 ≥ . ... Since q may not have full rank, we can "split off all zeros", by writing where q ∈ R d has full rank, i.e. does not contain zeros, such that d = 2 H0(q) ≤ m.
Since H 0 (p) ≤ H 0 (q), the distribution p must contain at least as many zeros as q, such that we can also split off (m − d) zeros, and write p = (p, 0, . . ., 0) T , where p ∈ R d .But then H(p) = H(p) < H(q) = H(q), so Corollary 2 tells us that there is an extension qAB of q such that pA ⊗ qB qAB .Moreover, no matter what ε > 0 we have chosen, we can always choose B and qAB such that S(q AB qA ⊗ qB ) < ε.Using our matrix notation for bipartite distributions, denoting the dimension of the system B by k, and using that adding a fixed number of zeros to two distributions does not change their majorization relation, we obtain where qAB denotes qAB as a large matrix block.By summing over the rows, one sees that the marginal of q AB on A is (q 1 , . . ., qd , 0, . . ., 0) T = q A , and by summing over the columns, one obtains qB as the marginal on B. Thus, q AB is the soughtfor extension.Moreover, since the relative entropy does not change if common zero entries of both arguments are removed, we also have S(q AB q A ⊗ q B ) = S(q AB qA ⊗ qB ) < ε.
This result allows us to answer an open problem from [42].There we have defined a notion of correlated trumping: we say that p c-trumps q, denoted p c q, if there exists some k ∈ N 0 and a k-partite distribution r 1,2,...,k such that p ⊗ (r 1 ⊗ r 2 ⊗ . . .⊗ r k ) q ⊗ r 1,2,...,k , where r 1 , . . ., r k are the marginals of r 1,...,k .In [42], we have shows that p c q for p = q if and only if H 0 (p) ≤ H 0 (q) and H(p) < H(q).We have also shown that we can always choose k = 3, but we were not able to answer the question whether k = 2 catalysts are always sufficient.Theorem 3 allows us to answer this question in the positive.
Theorem 4. Let p, q ∈ R m be probability distributions with p = q.Then there exist auxiliary systems B, C and a bipartite distribution r BC such that p A ⊗ (r B ⊗ r C ) q A ⊗ r BC if and only if H 0 (p) ≤ H 0 (q) and H(p) < H(q).Here, r B and r C denote the marginals of r BC .
Proof.The "only if"-part of the proof is completely analogous to the corresponding part of the proof of Theorem 3 and thus omitted.For the "if"-part, the premises p = q and H 0 (p) ≤ H 0 (q) as well as H(p) < H(q) imply, due to Theorem 3, that there exists some auxiliary system C and an extension q AC of q = q A such that p A ⊗ q C q AC .Now introduce another system B of the same dimension as A, and define a distribution q B which is just a copy of q = q A .Then Finally, since the majorization relation is permutation-invariant, we perform the swap of systems A ↔ B on the right-hand side, and obtain p A ⊗ (q B ⊗ q C ) q A ⊗ q BC (the left-hand side is simply a change of notation and not a physical swap).Thus, we can choose r BC := q BC .
Note that the results of [42], i.e. the characterization of c-trumping (as defined in ( 15)) via H and H 0 , is a strictly weaker result than the main majorization result of the present work, Theorem 3. First, as the proof of Theorem 4 above shows, the result of [42] can mathematically easily be obtained, and extended, from the results of the present paper.Second, Lemma 5 of [42] is a strictly weaker version of the present work's Theorem 3, establishing sufficiency of the monotonicity of all H α , for α ≥ 1, for the existence of a correlating catalytic state transition (between full-rank states), while now we know that monotonicity of H = H 1 is enough.Regarding the thermodynamic version of [42] described in [32], c-trumping as in (15) can be physically interpreted as the irreversible use of k auxiliary systems to admit a state transition p → q on the physical system of interest.That is, stochastic independence is used up as a "fuel" in a non-repeatable way.In contrast, the present paper describes a more natural thermodynamic scenario in which a single auxiliary system (that we can interpret as being part of a thermal machine) is used catalytically to implement state transitions on a single system.The auxiliary system can be used repeatedly on further copies of the system, which is arguably crucial for a thermodynamic cycle.

C. Results for non-trivial Hamiltonians
In this section, we will change our notation slightly, and call the auxiliary system M (for "thermal machine"), since B is misleading in the thermodynamic context (it could be confused with the "bath").
Our main tool to transfer the results for trivial Hamiltonians to the case of non-trivial Hamiltonians will be a technique that has been introduced in [10] and has also been applied in [32]: the embedding map Γ d .Given any ordered list of positive integers d = (d 1 , . . ., d n ), the stochastic map Γ d : R n → R D is defined as  The sorted energy eigenvalues are (E 1 , . . ., E 8 ) = (0, ∆, 0, ∆, E A , E A + ∆, E A , E A + ∆), where E A = k B T log 2. We use the thermomajorization criterion as explained, for example, in the Supplementary Note E of [9]: there exists a thermal operation mapping p to q if and only if the thermal Lorenz curve of p is everywhere on or above the thermal Lorenz curve of q.Using Mathematica, we have generated the plots in Figure 5 for ∆ = .26kB T , which shows that p's curve (in blue) is indeed nowhere below q's curve (in orange); the same must then be true for larger values of ∆ (and we have numerically verified this).We have also used Mathematica to verify directly the necessary inequalities for all "elbow points" of the curves.

Figure 1 :
Figure1: Thermal operation of the form that we are considering in this paper; compare Figure1in[10].We have a system A that we would like to act on, by transforming its state ρA into another state ρ A .We have access to a heat bath with an arbitrary Hamiltonian HB, which is in its thermal state γB at some fixed temperature T .The thermal machine contains a quantum system in state σM , and it controls a unitary transformation UAMB (symbolized by the pentagon), acting on the system A, heat bath B, and its internal system M .Crucially, this transformation is fully energy-preserving, i.e. [UAMB, HA + HM + HB] = 0.By tracing over the heat bath, with Z A the partition function of A, T the background temperature, k B Boltzmann's constant, and S α the Rényi divergence [61] of order α (see Subsection II F and Appendix).For α = 1, this reduces to the well-known Helmholtz free energy F 1 = F .B. Example: smaller work cost with a single qubit catalyst 7 0 X 7 9 3 7 W L S u e c X M C f y B 9 / k D g 4 O P p Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 4 7 j 2 0 5 J h 6 3 r I s i J g g /4 Z 5 x Z Y N E = " > A A A B 7 n i c b V D L S g N B E O z 1 G e M r 6 t H L Y B A 8 h V 0 R F E 8 B L x 4 j m A c k S 5 i d z C Z D Z m e W m V4 h L P k I L x 4 U 8 e r 3 e P N v n C R 7 0 M S C h q K q m + 6 u K J X C o u 9 / e 2 v r G 5 t b 2 6 W d 8 u 7 e / s F h 5 e i 4 Z X 7 0 X 7 9 3 7 W L S u e c X M C f y B 9 / k D g 4 O P p Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 4 7 j 2 0 5 J h 6 3 r I s i J g g /4 Z 5 x Z Y N E = " > A A A B 7 n i c b V D L S g N B E O z 1 G e M r 6 t H L Y B A 8 h V 0 R F E 8 B L x 4 j m A c k S 5 i d z C Z D Z m e W m V4 h L P k I L x 4 U 8 e r 3 e P N v n C R 7 0 M S C h q K q m + 6 u K J X C o u 9 / e 2 v r G 5 t b 2 6 W d 8 u 7 e / s F h 5 e i 4 Z X 7 0 X 7 9 3 7 W L S u e c X M C f y B 9 / k D g 4 O P p Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 4 7 j 2 0 5 J h 6 3 r I s i J g g /4 Z 5 x Z Y N E = " > A A A B 7 n i c b V D L S g N B E O z 1 G e M r 6 t H L Y B A 8 h V 0 R F E 8 B L x 4 j m A c k S 5 i d z C Z D Z m e W m V4 h L P k I L x 4 U 8 e r 3 e P N v n C R 7 0 M S C h q K q m + 6 u K J X C o u 9 / e 2 v r G 5 t b 2 6 W d 8 u 7 e / s F h 5 e i 4 Z X 7 0 X 7 9 3 7 W L S u e c X M C f y B 9 / k D g 4 O P p Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 4 7 j 2 0 5 J h 6 3 r I s i J g g /4 Z 5 x Z Y N E = " > A A A B 7 n i c b V D L S g N B E O z 1 G e M r 6 t H L Y B A 8 h V 0 R F E 8 B L x 4 j m A c k S 5 i d z C Z D Z m e W m V4 h L P k I L x 4 U 8 e r 3 e P N v n C R 7 0 M S C h q K q m + 6 u K J X C o u 9 / e 2 v r G 5 t b 2 6 W d 8 u 7 e / s F h 5 e i 4 Z X 7 0 X 7 9 3 7 W L S u e c X M C f y B 9 / k D g 4 O P p Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 4 7 j 2 0 5 J h 6 3 r I s i J g g /4 Z 5 x Z Y N E = " > A A A B 7 n i c b V D L S g N B E O z 1 G e M r 6 t H L Y B A 8 h V 0 R F E 8 B L x 4 j m A c k S 5 i d z C Z D Z m e W m V4 h L P k I L x 4 U 8 e r 3 e P N v n C R 7 0 M S C h q K q m + 6 u K J X C o u 9 / e 2 v r G 5 t b 2 6 W d 8 u 7 e / s F h 5 e i 4 Z X 7 0 X 7 9 3 7 W L S u e c X M C f y B 9 / k D g 4 O P p Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 4 7 j 2 0 5 J h 6 3 r I s i J g g /4 Z 5 x Z Y N E = " > A A A B 7 n i c b V D L S g N B E O z 1 G e M r 6 t H L Y B A 8 h V 0 R F E 8 B L x 4 j m A c k S 5 i d z C Z D Z m e W m V4 h L P k I L x 4 U 8 e r 3 e P N v n C R 7 0 M S C h q K q m + 6 u K J X C o u 9 / e 2 v r G 5 t b 2 6 W d 8 u 7 e / s F h 5 e i 4 Z X 7 0 X 7 9 3 7 W L S u e c X M C f y B 9 / k D g 4 O P p Q = = < / l a t e x i t > 2 / P a 8 1 r o o 4 y n A E x 3 A K P l x A A 2 6 g C S 2 g I O A Z X u H N U c 6 L 8 + 5 8 z F t L T j F z C H / g f P 4 A B 6 2 Q h A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " p U G c n B O 9 g z + N u 7 03 n 9 R i C B E 2 z j U = " > A A A B 8 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S Q i K J 4 K X j x W s K 3 S h r L Z T t K l u 5 u w u x F K 7 K / w 4 k E Rr / 4 c b / 4 b t 2 0 O 2 v p g 4 P H e D D P z w p Q z b T z v 2 y m t r K 6 t b 5 Q 3 K 1 v b O 7 t 7 1 f 2 D t k 4 y R b F F E 5 6 o + 5 B o 5 E x i y z D D 8 T 5 V S E T I s R O O r q d 2 / P a 8 1 r o o 4 y n A E x 3 A K P l x A A 2 6 g C S 2 g I O A Z X u H N U c 6 L 8 + 5 8 z F t L T j F z C H / g f P 4 A B 6 2 Q h A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " p U G c n B O 9 g z + N u 7 03 n 9 R i C B E 2 z j U = " > A A A B 8 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S Q i K J 4 K X j x W s K 3 S h r L Z T t K l u 5 u w u x F K 7 K / w 4 k E Rr / 4 c b / 4 b t 2 0 O 2 v p g 4 P H e D D P z w p Q z b T z v 2 y m t r K 6 t b 5 Q 3 K 1 v b O 7 t 7 1 f 2 D t k 4 y R b F F E 5 6 o + 5 B o 5 E x i y z D D 8 T 5 V S E T I s R O O r q d (α) n = lim α 0 ∆ (α) n = ∆ (0) n = 0. Let us first compute this difference for α = 1.

Figure 5 :
Figure 5: The thermal Lorenz curves signify the possibility of state transition (25) by a thermal operation.
Figure3: Example work cost if correlations between M and A are allowed to build up.Since M is locally exactly preserved, it can be reused on further states (just not on those ones on which it has already acted before).This transition is possible at work cost of only ∆.26kBT (about 1/3 less than in Figure The good (and arguably surprising) news of the present work is that the latter case is particularly simple to characterize, namely in terms of the free energy F alone.The question of which version to choose depends entirely on the physical context.The results of this paper open up a multitude of interesting open problems.First, does Main Result 1 remain true in the presence of coherence?Can we reformulate the work extraction result (Main Result 3) without a max entropy sink (Open Problem in Subsection II E)?And finally, do machines that operate in this correlating-catalytic way have any realization in nature?and A. Acín, No-Go Theorem for the Characterization of Work Fluctuations in Coherent Quantum Systems, Phys.Rev. Lett.118, 070601 (2017).
Lemma 5.Let A be a system with thermal distribution γ A that has only rational entries, i.e. that can be written in the form i = (1/d i , . .., 1/d i ) ∈ R diis the uniform distribution in d i dimensions, and D = n i=1 d i .