On the sufficiency of entropic inequalities for detecting non-classicality in the Bell causal structure

Classical and quantum physics impose different constraints on the joint probability distributions of observed variables in a causal structure. These differences mean that certain correlations can be certified as non-classical, which has both foundational and practical importance. Rather than working with the probability distribution itself, it can instead be convenient to work with the entropies of the observed variables. In the Bell causal structure with two inputs and outputs per party, a technique that uses entropic inequalities is known that can always identify non-classical correlations. Here we consider the analogue of this technique in the generalization of this scenario to more outputs. We identify a family of non-classical correlations in the Bell scenario with two inputs and three outputs per party whose non-classicality cannot be detected through the direct analogue of the previous technique. We also show that use of Tsallis entropy instead of Shannon entropy does not help in this case. Furthermore, we give evidence that natural extensions of the technique also do not help. More precisely, our evidence suggests that even if we allow the observed correlations to be post-processed according to a general class of non-classicality non-increasing operations, entropic inequalities for either the Shannon or Tsallis entropies cannot detect the non-classicality, and hence that entropic inequalities are generally not sufficient to detect non-classicality in the Bell causal structure.


I. INTRODUCTION
Causal structures are a useful tool for understanding correlations between observed events.Such correlations may be mediated by an influence travelling from one to the other, or come about due to common causes, which may not be observed.The nature of any unobserved causes depends on the theory being considered.For instance they may be classical, quantum or from a generalized probabilistic theory (GPT) [1], and the kinds of observed correlations that are possible in general depends on this.At the foundational level, studying the differences gives us insight into how the notion of causality differs between theories, while, on a practical level, these differences are crucial for applications in device-independent cryptography [2][3][4][5][6][7].
One way to establish a difference is to violate a Bell inequality [8], where we use the term to mean a necessary condition on the observed correlations when any unobserved systems are classical.The amount of violation can be thought of as a measure of the non-classicality (also termed non-locality) of the distribution.Bell inequalities are often introduced using the (bipartite) Bell structure (see Figure 1(a)).Here there are four observed variables: A and B corresponding to the inputs of each party, and X and Y corresponding to the outputs.In the case that the numbers of possible inputs are i A and i B and likewise the number of possible outputs are o A and o B , we call the scenario the (i A , i B , o A , o B ) Bell scenario.For the (2, 2, 2, 2) case, the CHSH inequalities [9] are known to be the only class of Bell inequalities required to completely characterize the scenario (i.e., all extremal 2-setting, 2-outcome Bell inequalities are equivalent to the CHSH inequalities up to symmetry).As i A , i B , o A and o B increase, many new classes of extremal Bell inequalities (inequivalent to the CHSH) are found and these scenarios quickly become difficult to fully characterize [10][11][12][13].
One attempt at avoiding this difficulty is by moving away from probability space and to consider instead inequalities expressed in terms of the entropies of the variables involved.There are two ways that these can be used: either directly using the causal structure under consideration, or by using the post-selection technique in which the original causal structure is first modified (more details can be found later in this paper).Braunstein and Caves [14] were the first to derive an entropic Bell inequality.They considered the post-selected version of the Bell causal structure shown in Figure 1(b) and found entropic inequalities that hold for all classical distributions.These can be violated when one or more of the unobserved nodes are quantum, and hence behave like entropic versions of Bell inequalities.
It is worth noting that in the bipartite Bell causal structure without post-selection the set of achievable Shannon entropies over the observed variables for classical and quantum causes coincide [15].Hence, without post-selecting, there are no entropic Bell inequalities in this case.Further, the use of other entropic measures such as Tsallis entropies to analyse this problem in the absence of post-selection has also been shown to have limitations [16] and no quantum violations are known for the entropic Bell inequalities derived in [16].Because of this, we focus on post-selected causal structures in this paper.
It is natural to ask whether the non-classicality of a distribution can always be detected through post-selected entropic inequalities.For the (d, d, 2, 2) Bell scenarios with d ≥ 2, this is known to be the case [17] in the following sense.For every non-classical distribution in the (d, d, 2, 2) Bell scenario, there is a transformation that does not make any distribution more non-classical, and such that the resulting distribution violates one of the BC entropic inequalities.The main purpose of this work is to investigate whether a similar result holds for non-binary outcomes.We study the (2, 2, d, d) Bell scenarios with d > 2 and allow a class of post-processing operations (including mixing with classical distributions; see later for details), to see whether when applied to any non-classical distribution the result violates an entropic Bell inequality.We investigate this using both Shannon and Tsallis entropies.
The structure of the remainder of the paper is as follows.After introducing our notation and reviewing some existing work in Section II, we proceed to investigate the (2, 2, 3, 3) scenario.In Section III we analyse the (2, 2, 3, 3) scenario in entropy space using post-selection and compare the non-classicality detected by Shannon and Tsallis entropic inequalities.We consider a class of isotropic non-classical distributions in the (2, 2, 3, 3) scenario and give evidence that arbitrary Shannon entropic inequalities and a class of Tsallis entropic inequalities cannot detect the non-classicality of these distributions for any choice of mixing with classical distributions.In Section III B, we consider post-processing the observed distributions using more general operations that do not increase the non-classicality (we call these non-classicality non-increasing (NCNI) operations).This leads us to conjecture that particular non-classical distributions do not violate any of the considered entropic inequalities even after post-processing with a large class of NCNI operations, and hence that the method of [17] for the (2, 2, 2, 2) scenario does not generalize.Finally, in Section IV, we conclude and discuss some open questions.These results, along with those of [15,16], highlight some of the limitations of the entropic approach to analysing causal structures.

A. Probability distributions and entropy
We begin with some notation.Given a conditional probability distribution p XY AB where A, B, X and Y have cardinalities i A , i B , o A and o B respectively, we can express the distribution using a table.For instance, in the case where all the variables take values in {0, 1} and using p(xy ab) as an abbreviation for p XY AB (xy ab), this is done as and the generalisation to larger alphabets is analogous (see, e.g., [18]).This format is convenient because it makes it easy to check whether a distribution is no-signalling, i.e., to check that p X AB is independent of B and that p Y AB is independent of A.
Given a random variable X, the Shannon entropy is defined by1 H(X) = − ∑ x p X (x) ln p X (x).Given two random variables, X and Y , the conditional Shannon entropy is defined by H(X Y ) = − ∑ xy p XY (xy) ln p X Y =y (x).The following properties hold:

B. Causal structures
The relationships between different variables of interest can be conveniently expressed as a causal structure.This is a directed acyclic graph (DAG) where the observed variables are nodes, and there may be additional nodes representing unobserved systems.Given such a causal structure, we distinguish the cases where the hidden systems are classical, quantum or are from some generalized probabilistic theory.For every classical causal structure that has at least one parentless observed node, a post-selected causal structure can be defined.The general technique for doing this can be found in [19] (for example).
In this work we will only consider the Bell causal structure with two inputs per party and the post-selected version thereof (see Figure 1).The post-selected causal structure is obtained by removing the parentless observed nodes A and B in the original causal structure 1(a) and replacing the descendants X and Y with two copies of each i.e., X A=0 , X A=1 , Y B=0 , Y B=0 such that the original causal relations are preserved and there is no mixing between the copies (this is shown in Figure 1(b)).It makes sense to do this in the classical case because classical information can be copied, so we can simultaneously consider the outcome X given A = 0 and that given A = 1.By contrast, in the quantum case the values of A correspond to different measurements that are used to generate X.It hence does not make sense to consider a joint distribution over X A=0 and X A=1 in this case.We therefore only consider the subsets of the observed variables that co-exist where we use the short form X 0 Y 0 for the set {X 0 , Y 0 } etc.For any inequalities derived for the co-existing sets in the classical case we can look for quantum or GPT violations.
C. The (2,2,2,2) Bell scenario in probability space For the bipartite Bell causal structure 1(a), the set of all observed distributions p XY AB that can arise when Λ is classical corresponds to the set of correlations that admit a local hidden variable model, i.e., this is the set that can be written In this work we will refer to such correlations either as local or as classical and denote the set of all such distributions L. We also use L (2,2,2,2) to denote the local distributions in the (2, 2, 2, 2) case (and analogously for other cases).
The set of local correlations form a convex polytope, which can be specified in terms of a finite set of Bell inequalities, each a necessary condition for classicality.In the (2, 2, 2, 2) case, there are eight extremal Bell inequalities (facets of the local polytope).One has the form and the other seven are equivalent under local relabellings [18].We denote these by I k CHSH for k ∈ [8], where I 1 CHSH ∶= I CHSH and [n] stands for the set {1, 2, ...., n} where n is a positive integer.This provides the facet description of the (2, 2, 2, 2) local polytope.
In the vertex picture, the (2, 2, 2, 2) local polytope has 16 local deterministic vertices and the (2, 2, 2, 2) nonsignalling polytope shares the vertices of the local polytope and has eight more: the PR box and seven distinct local relabellings [18].The PR box distribution satisfies X ⊕ Y = A.B and has the form We

D. Entropic inequalities and post-selection
In [14], Braunstein and Caves derived a set of constraints on the post-selected causal structure of Figure 1(b) and showed that these constraints can be violated by quantum correlations.To discuss these we introduce the notion of entropic classicality.For every distribution p XY AB in the Bell causal structure (Figure 1(a)), we can associate an entropy vector v ∈ R 8 in the post-selected causal structure (Figure 1(b)) whose components are the entropies of each element of the set S (Equation ( 2)) distributed according to p XaY b ∶= p XY A=a,B=b .Let H be the map that takes the observed distribution to its corresponding entropy vector in the post-selected causal structure.
Definition 1 (Entropic classicality).An entropy vector v ∈ R 8 is classical with respect to the bipartite Bell causal structure (Figure 1(a)) if there exists a local probability distribution p XY AB ∈ L such that H(p XY AB ) = v.Further, a distribution p XY AB is entropically classical if there exists a classical distribution with the same entropy vector, i.e., if there exists a classical entropy vector v such that H(p XY AB ) = v.
The set of all classical entropy vectors forms a convex cone.An example of a nonclassical distribution that is entropically classical is discussed in Section II E.
We now review how the Braunstein Caves (BC) Inequalities are derived for the case when the observed parentless nodes A and B are binary.In this case, the post-selected causal structure 1(b) imposes no additional constraints on the distribution (or entropies) of the observed nodes X 0 , X 1 , Y 0 and Y 1 because they share a common parent and thus any joint distribution over X 0 , X 1 , Y 0 and Y 1 can be realised in the causal structure 1(b).By contrast, any correlations in the original causal structure 1(a) must obey the no-signalling constraints over the observed nodes A, B, X and Y since A does not influence Y and B does not influence X in this causal structure.The inequalities derived by Braunstein and Caves follow by applying Properties P1-3 to the variables {X 0 , X 1 , Y 0 , Y 1 }.The derived relations hold for the classical causal structure (and not necessarily for the quantum and GPT cases) because only in the classical case does it make sense to consider a joint distribution over these four variables that in the quantum and GPT cases do not co-exist (cf.Section II B).It is worth remarking that without post-selection, no quantum-violatable constraints exist for this causal structure [15].The BC inequalities (6) are entropic Bell inequalities i.e., they hold for every classical entropy vector in the post-selected causal structure 1(b).There are four BC inequalities It has been shown in [20] that these four inequalities are complete in the following sense (the lemma below is implied by Corollary V.3 in [20]).

Lemma 1. A distribution in the postselected Bell scenario with binary A and B is entropically classical if and only if it satisfies the four BC inequalities (6).
It turns out that in the (2, 2, 2, 2) Bell scenario, non-classical distributions that do not violate the BC inequalities can be made to do so with some additional post-processing, as shown in [17].We review this result below before analysing the same question in the (2, 2, 3, 3) scenario.The current section summarises the relevant results of [17] regarding the sufficiency of entropic inequalities in the (2, 2, 2, 2) scenario.As previously mentioned, it is possible for a non-classical distribution to have the same entropy vector as a classical one and hence to be entropically classical.For example, the maximally non-classical distribution in probability space, p PR (Equation ( 5)) is entropically classical since it has the same entropy vector as the classical distribution and hence cannot violate any of the BC inequalities2 .However, the distribution 1 2 p PR + 1 2 p C maximally violates I 4 BC ≤ 0 attaining a value of ln 2. That convex mixtures of non-violating distributions can lead to a violation is due to the fact that entropic inequalities are non-linear in the underlying probabilities (in contrast to the facet Bell inequalities in probability space).
In [17], it was shown that such a procedure is possible for every non-classical distribution in the (d, d, 2, 2) Bell scenario with d ≥ 2 i.e., for every such distribution, there exists a NCNI transformation such that the resultant distribution violates a Shannon entropic BC inequality (6).Thus, non-classicality can be detected in this scenario by processing the observed correlations (in a way that cannot increase the non-classicality) before using a BC entropic inequality on the result.In this sense the BC entropic inequalities provide a necessary and sufficient test for nonclassicality in these scenarios.
In more detail, for the (d, d, 2, 2) case this works as follows.First, one defines a special class of distribution, an isotropic distribution, as follows for some k ∈ [8] and ∈ [0, 1].
where p noise is white noise i.e., the distribution with all entries equal to 1 4.In the (2, 2, 2, 2) Bell scenario the isotropic distribution p k iso is non-classical if and only if > 1 2. The NCNI transformation used in [17] involves first transforming the observed distribution into an isotropic distribution through through a local depolarisation procedure that maintains its non-classicality.Second, it is shown that for any non-classical isotropic distribution i.e., a p k iso with > 1 2, there exists a classical distribution C violates one of the BC entropic inequalities for sufficiently small v > 0. In particular, the value of I k BC for p k E,v can be expanded for small v as where f ( ) is a function of , independent of v (see [17] for details).Thus for any > 1 2, the corresponding isotropic distributions are non-classical and taking v arbitrarily small can make I k BC positive which is a violation of the entropic inequality.We summarise the main result of [17] for (2, 2, 2, 2) Bell scenarios in the following Theorem (which is implicit in [17]).
Theorem 2. For every non-classical distribution, p XY AB in the (2, 2, 2, 2) Bell scenario, there exists a NCNI transformation T , such that T (p XY AB ) violates one of the BC entropic inequalities (6).
One of the aims of the present paper is to study whether this result extends to the case where the number of outcomes per party is more than two.In general, the (2, 2, d, d) Bell polytope for d > 2 has new, distinct classes of Bell inequalities and extremal non-signalling vertices other than the CHSH inequalities and the PR boxes.In the following, we analyse this problem for the d = 3 case, for which it is helpful to first describe the (2, 2, 3, 3) scenario in probability space.
F. The (2,2,3,3) Bell scenario in probability space In the (2, 2, 3, 3) Bell scenario, there are two classes of Bell inequality that completely characterize the local polytope: the CHSH inequality and the CGLMP inequality defined by [21] where all the random variables take values in {0, 1, 2} and all additions and subtractions of the random variables are modulo 3. The (2, 2, 3, 3) local polytope has a total of 1116 facets, 36 of which correspond to positivity constraints, 648 to (lifted) CHSH facets (these are equivalent to first coarse-graining two of the outputs for each party into one (the coarse-graining can depend on the input) and then applying one of the eight (2,2,2,2) CHSH inequalities), and the remaining 432 are CGLMP facets [22].Converting this facet description to the vertex description (e.g., using the porta software [23]) one can obtain all the vertices of the (2, 2, 3, 3) non-signalling polytope.This yields 81 local deterministic vertices, 648 PR-box type vertices and 432 new extremal non-signalling vertices that maximally violate each of the CGLMP inequalities.We call these new vertices the CGLMP-vertices and the specific vertex that maximally violates (10) is  is the uniform distribution with all entries equal to 1 9 and ∈ [0, 1].In order to show the insufficiency of entropic inequalities, one needs to identify at least one non-classical distribution whose non-classicality cannot be detected through entropic inequalities.We will discuss this for the class p in what follows (by symmetry all the arguments will also hold for isotropic distributions corresponding to relabelled versions of p NL and the corresponding BC inequalities).
In the (2, 2, 3, 3) case, there is a one-to-one correspondence between the PR box vertices and the CHSH inequalities and between the CGLMP vertices (local relabellings of p NL ) and the CGLMP inequalities (in the sense that each PR-box/CGLMP vertex violates exactly one CHSH/CGLMP inequality).However, there is not a similar one-toone correspondence between all Bell inequalities and all extremal non-local vertices: p NL maximally violates both the CGLMP Inequality (10) and the lifted CHSH Inequality whose evaluation is equivalent to applying the output coarse-graining 0 ↦ 0, 1 ↦ 1 and 2 ↦ 1 for each party and then evaluating (4), for instance.

Using Shannon entropy
In the entropic picture of the (2, 2, 3, 3) scenario, the 4 BC Inequalities (6) still hold (these are valid independently of the cardinality of the random variables).Again, analogously to the (2, 2, 2, 2) case, the maximally non-local distribution, p NL has the same entropy vector as the classical distribution (amongst others).The distribution p NL is hence entropically classical.However, in contrast to the (2, 2, 2, 2) case, we have evidence suggesting that there are values of for which p is non-classical, but such that the mixture vp + (1 − v)p L is entropically classical for all classical distributions p L and all v ∈ [0, 1], i.e., there exist nonclassical distributions in the (2, 2, 3, 3) scenario for which mixing with classical distributions never gives rise to a non-classical entropy vector.
We begin by considering mixing p in analogy with the treatment of the (2, 2, 2, 2) case.Although we have not fully proven this, this mixing appears to be optimal in the sense that when it does not allow for entropic violations, no other mixing can either.This allows us to identify a range of for which the mixture p (2,2,3,3) iso, is non-classical, yet appears to remain entropically classical even when mixed with arbitrary classical distributions.We begin with two propositions whose proofs can be found in Appendix C.
does not violate any of the BC entropic inequalities (6) for any v ∈ [0, 1].However, for > 4 7, there always exists a v ∈ [0, 1] such that the entropic inequality Proof.This follows from Proposition 2 and Lemma 1.
While Proposition 2 shows that the proof strategy of [17] does not directly generalize to all non-classical distributions in the (2, 2, 3, 3) case, it does not rule out the possibility that there may exist other mixings with classical distributions or more general NCNI transformations that could transform p (2,2,3,3) iso, for ∈ (1 2, 4 7] into a distribution that violates one of the BC inequalities.To this effect, we considered all possible mixings of p BC over this polytope for each 4 and were unable to find violations.Given the non-linear nature of the objective function, it is possible that the numerical approach missed the true optimum.Nevertheless, this is evidence for the following conjecture and is presented in more detail in Appendix B. Proposition 2 along with the evidence and figures of Appendices A and B also suggest this Conjecture.
Conjecture 1.Let ≤ 4 7.For all mixtures of the distribution p (2,2,3,3) iso, with classical distributions in the (2, 2, 3, 3) Bell scenario, the resulting distribution is entropically classical, i.e., all distributions in Conv(p The interesting cases of Conjecture 1 are for non-classical distributions (i.e., for > 1 2), and the most relevant of these are those that can be achieved in quantum theory.The next two remarks address this case.
Remark 1.There exist quantum achievable, non-classical distributions whose non-classicality can be detected through entropic inequalities.This is because the quantum distribution considered in [21,Equation (14) is quantum achievable since it can be obtained from the density operator u ψ ′ ⟩⟨ψ ′ + (1 − u) I 9 (where ψ ′ ⟩ is the two qutrit state producing p ′ QM ) and the same quantum measurements that produce p ′ QM from ψ ′ ⟩.

Using Tsallis entropies
Given the results (Proposition 2 and Conjecture 1) of the previous section for Shannon entropic inequalities, a natural question is whether other entropic measures can provide an advantage over the Shannon entropy in detecting non-classicality.Here, we look at Tsallis entropies and find that similar results hold in this case as well, suggesting that Tsallis entropies also do allow us to completely solve the problem.
For a classical random variable X distributed according to the discrete probability distribution p X , the order q Tsallis entropy of X for a real parameter q is defined as [24] In this expression we have used the q-logarithm function ln q p x = p 1−q x −1 1−q , which converges to the natural logarithm function as q → 1.This means that lim q→1 S q (X) = H(X) and hence that S q (X) is continuous in q.We henceforth write ∑ x instead of ∑ {x∶px>0} with the implicit understanding that probability zero events are excluded from the sum.
The Tsallis entropies for q > 1 satisfy many of the same properties as the Shannon entropy.In particular, monotonicity, strong subadditivity and chain rule all hold for Tsallis entropies for all q > 1 [25,26], making these polymatroids like the Shannon entropy.(Other generalized entropies such as Rényi, min/max entropies do not satisfy one or more of these properties in general and it is not clear whether the analogues of (6) hold for these.)The aforementioned properties are sufficient to derive the BC inequalities, which hence also hold for Tsallis entropy.In other words, for all q ≥ 1 we have and we refer to these as the Tsallis entropic BC inequalities.Entropic classicality in Tsallis entropy space can be defined analogously to Definition 1, in terms of Tsallis entropy vectors over the set of variables S (Equation ( 2)).We say that a distribution is q-entropically classical if its entropy vector written in terms of the Tsallis entropy of order q is achievable using a classical distribution.In the case of the Shannon entropy, we used the fact (Lemma 1) that the BC Inequalities Inequalities ( 6) are known to be necessary and sufficient for entropic classicality for 2-input Bell scenarios [20].However, it is not clear if the result of [20] generalises to Tsallis entropies for q > 1.Thus our results in the Tsallis case are weaker than those for Shannon, being stated only for the BC inequalities.We leave the generalization to arbitrary Tsallis entropic inequalities as an open problem.
does not violate any of the Tsallis BC inequalities (14) for any v ∈ [0, 1] and q > 1.However, for > 4 7 and every q > 1, there always exists a v ∈ [0, 1] such that the entropic inequality I 4 BC,q ≤ 0 is violated by p .
We refer the reader to Appendix C for a proof of this Proposition.To investigate the extension to other mixings, we tried the same computational procedure (see Appendix B) as in the Shannon case.We found no violation of the Tsallis entropic BC inequalities for any mixings of p (2,2,3,3) iso, with classical distributions, for several values of q > 1 and ∈ (1 2, 4 7], leading to the following conjecture, which is similar to Conjecture 1.
Figure 3a, shows the values of and v for which I 4 BC,q=2 evaluated with p is positive, which is also suggestive of this conjecture.Remark 3. Any impossibility result for the (2, 2, 3, 3) scenario also holds in the (2, 2, d, d) case for d > 3 because the former is always embedded in the latter i.e., every distribution in the (2, 2, 3, 3) scenario has a corresponding distribution in all the (2, 2, d, d) scenarios with d > 3 which can be obtained by assigning a zero probability to the additional outcomes.Further, the entropic Inequalities (6) remain the same for all these scenarios as they do not depend on the cardinality of the random variables involved.Thus the existence non-classical distributions for the d = 3 case whose non-classicality cannot be detected by entropic inequalities implies the same result for all d > 3.

B. Beyond classical mixings: other NCNI operations
So far, we only considered mixing with classical distributions to obtain entropic violations and gave evidence that this does not work for some non-classical distributions in the (2, 2, 3, 3) scenario.This motivates us to study whether using other NCNI operations (or convex combinations thereof) allows us to detect this non-classicality through entropic violations.We show in this section that if Conjectures 1 and 2 hold then these other operations do not help.First consider the following example.
The maximum possible violation of the BC inequalities in the (2, 2, 2, 2) case is I 4 BC = ln 2 [17].This is derived by considering only Shannon inequalities within the coexisting sets, and the bound that the maximum entropy of a binary variable is ln 2.An analogous proof holds in the (2, 2, 3, 3) case, except that the bound is then ln 3.In the former case, for = 1 and v = 1 2, we have p E = 1 2 p PR + 1 2 p C , which maximally violates I 4 BC ≤ 0, while in the latter case, one such distribution is formed by (p NL + p * NL + p (2,2,3,3) C ) 3, where p * NL is another extremal non-local distribution in the (2, 2, 3, 3) non-signalling polytope and is a relabelling of p NL , Since the equal mixture (p NL + p . One could then consider whether for ∈ (1  2b for an illustration).The corresponding results also hold for the Tsallis case with q > 1, i.e., Proposition 3 also holds with p(2,2,3,3) iso replacing p (2,2,3,3) iso (see also Figure 3).These suggest that mixing with relabellings in addition to mixing with classical distributions may also not help to violate entropic inequalities when ≤ 4 7.We investigate this in the rest of this section.
Here we consider output coarse-grainings (combining outputs) and input/output relabellings in addition to mixing with classical distributions, and convex combinations thereof.None of these operations can increase the non-classicality of a distribution [27].We find that these also do not help in the entropic approach, i.e., if Conjectures 1 and 2 hold, then the non-classicality of p (2,2,3,3) iso, for ∈ (1 2, 4 7] cannot be detected by arbitrary Shannon entropic inequalities or Tsallis entropic BC inequalities even after pre-processing with one of these NCNI operations. In the (2, 2, 3, 3) scenario, coarse-graining corresponds to combining 2 of the 3 outputs into a single output.This could be done for all four input choices (A, B) = {(0, 0), (0, 1), (1, 0), (1, 1)} or for only some of them.We find that there are 81, 108, 54 and 12 distinct coarse-grainings when the outcomes of either 4, 3, 2 or 1 input choices are coarsegrained.Thus there are a total of 255 coarse-grainings of p , which we denote by {p R,j } j , j ∈ [432] (this set includes p (2,2,3,3) iso, = p R,1 ).The set of all distributions that can be achieved through a convex mixture of p (2,2,3,3) iso, with its coarse-grainings, relabellings and classical distributions is a convex polytope Π for each which is the convex hull of these 255 + 432 + 81 = 768 points, i.e., We present the results for coarse-grainings and relabellings separately below.Firstly, we show that all the coarsegrainings {p This is proven in Appendix C. Note that Proposition 4 implies that Π = Conv({p R,j } j ⋃{p L,k } k ) ∀ ≤ 4 7, and that it is not necessary to consider coarse-grainings.We now prove that if Conjectures 1 and 2 hold for p  ) does not violate a Shannon or Tsallis (q > 1) entropic BC inequality.

IV. DISCUSSION
The results and observations of Section III A suggest that in the (2, 2, 3, 3) scenario, the set of distributions for which post-processing via an arbitrary mixing enables non-classicality detection with a Shannon BC entropic inequality is the same as the set that enables non-classicality detection with a Tsallis BC entropic inequality.However, when entropic detection of non-classicality is possible, using Tsallis entropy can make it easier to do this detection in the sense that there is a wider range of mixings that achieve this (see Figure 3).As a specific example, consider the distribution p (2,2,3,3) E, =0.7,v=0.4 .Figures 3a and 2a indicate that this distribution violates the Tsallis entropic inequality I 4 BC,q ≤ 0 but does not violate the Shannon entropic inequality I 4 BC ≤ 0. However, we can always further mix p  BC ≤ 0. Crucially, we have evidence that there are distributions in the (2, 2, 3, 3) scenario for which other NCNI operations including coarse-graining, input/output relabelling, mixing with classical distributions, or convex combinations of these do not enable detection of non-classicality with arbitrary Shannon entropic inequalities or Tsallis entropic BC inequalities.This is in contrast to the (2, 2, 2, 2) scenario [17], where for any non-classical distribution, there is always a simple NCNI operation that results in a distribution violating one of the Shannon BC inequalities.
Although we considered a range of NCNI operations, strictly speaking the key property of the considered transformation is that it never maps a classical distribution to a non-classical one.Thus it would be interesting to extend the present results to other maps with this property, which need not be NCNI.In principle there could be an non-linear map of this form that allows the entropic BC inequalities to detect a wider range of non-classical distributions.It would be interesting to see whether for any non-classical distribution p (2,2,3,3) iso, with 1 2 < ≤ 4 7 (conjectured to be entropically classical with respect to the NCNI operations considered), one of these more general operations would allow its non-classicality to be detected entropically.We leave this as an open question.
In [28] it was shown that Tsallis entropic inequalities can detect non-classicality undetectable by Shannon entropic inequalities in the (2, 2, 2, 2) and (2, 2, 3, 3) Bell scenarios.This was without consideration of mixing with classical distributions or other post-processing on the distributions, so is not in conflict with [17].As mentioned above, when mixing is not considered, we also find examples where it is advantageous to use Tsallis entropy in the (2, 2, 3, 3) scenario.
Proof.Consider the function f ∶ (0, 1) × (0, 1) → R given by where we implicitly extend the domain to [0, 1] × [0, 1] by taking the relevant limit.The Shannon entropic expression I 4 BC ( , v) evaluated for the distribution p (seen as a function of and v) is then given as Thus all the following arguments for f ( , v) also hold for I 4 BC .We first use that for c > 0 and a ∈ R for sufficiently small v we have ) .
Further, using the expression for the derivative of f ( , v) with respect to v in Equation (C3), we find that for > 4 7, lim v→0 ∂ ∂v f ( , v) = ∞.Thus, since f ( , v) = 0 for v = 0, sufficiently close to v = 0 there exists a v such that f ( , v) > 0. This proves the claim.does not violate any of the Tsallis BC inequalities (14) for any v ∈ [0, 1] and q > 1.However, for > 4 7 and every q > 1, there always exists a v ∈ [0, 1] such that the entropic inequality I 4 BC,q ≤ 0 is violated by p  (seen as a function of q, and v) is given as For q > 1, the following arguments for g(q, , v) also hold for I 4 BC,q .Note that Corollary 5. pi,j mix, ∶= 1 2 p R,i + 1 2 p R,j is local ∀j ≠ i if and only if ≤ 4 7.
Theorem 6 (Bemporad et. al 2001 [32]).Let P and Q be polytopes with vertex sets V and W respectively i.e., P = Conv(V ) and Q = Conv(W ).Then P ⋃ Q is convex if and only if the line-segment [v, w] is contained in P ⋃ Q ∀v ∈ V and w ∈ W .
Let P j = Conv({p R,j } ⋃{p L,k } k }) and P be the set of polytopes P ∶= {P j } j .We use the above theorem to prove the final result that establishes Propositions 5 and 6.Lemma 7. Let V j be the vertex set of the polytope P j ∈ P and V ∶= ⋃ j V j .Then, ⋃ Pj ∈P P j = Conv(⋃ i V j ) = Conv(V ) = Π ∀ ≤ 4 7.
Proof.By Corollary 5, for i ≠ j we have that 1  2 p R,i + p R,j ∈ L (2,2,3,3) = P i ⋂ P j ∀ ≤ 4 7.This implies that αp R,i + (1 − α)p R,j ∈ P i ⋃ P j ∀ ≤ 4 7, α ∈ [0, 1], i.e., the line segment [p R,i , p R,j ] is completely contained in the union of the corresponding polytopes P i ⋃ P j .Note that all other line segments [v i , v j ] with v i ∈ V i and v j ∈ V j are contained in P i ⋃ P j by construction since at least one of v i or v j would be a local-deterministic vertex.Therefore, by Theorem 6, P i ⋃ P j is convex ∀i, j and ≤ 4 7.We can then apply Proposition 7 and Theorem 6 to the convex polytopes P i ⋃ P j and P k and show that P i ⋃ P j ⋃ P k is convex ∀i, j, k and ≤ 4 7. Proceeding in this way, we conclude that ⋃ Pi∈P i is convex ∀ ≤ 4 7, and hence ⋃ Pj ∈P P j = Conv(⋃ i V j ) = Conv(V ) = Π .

FIG. 1 :
FIG. 1: (a) The bipartite Bell causal structure.The nodes A and B represent the random variables corresponding to independently chosen inputs, while X and Y represent the random variables corresponding to the outputs.Λ is an unobserved node representing the common cause of X and Y .(b) The post-selected Bell causal structure for two parties.The observed nodes Xa represent the outputs when the input is a ∈ {0, 1} and likewise for Y b .Note that X0 and X1 are never simultaneously observed and likewise Y0 and Y1.

) 2
violates I 4 BC ≤ 0 non-maximally, one may be motivated to use the non-local distribution pNL = (p NL + p * NL ) 2 in place of p NL in the definition of p

.Proposition 5 .
then they continue to hold even when we consider arbitrary convex combinations with classical distributions and local relabellings of p (2,2,3,3) iso, Let ≤ 4 7.If Conjecture 1 holds, then every distribution in Π is Shannon entropically classical.Proposition 6.Let ≤ 4 7.If Conjecture 2 holds, then every distribution in Π satisfies the Tsallis entropic BC Inequalities (14) ∀q > 1.These are proven in Appendix C and give the following corollary.

Proposition 3 .
For ≤ 4 7, p The Tsallis entropic expression I 4 BC,q ( , v) evaluated for the distribution p denote the eight extremal non-signalling vertices equivalent under local relabellings to p PR by p k PR , k ∈ [8] where p 1 PR ∶= p PR .Note that the 8 CHSH inequalities {I k CHSH } are in one-to-one correspondence with these 8 extremal non-signalling points i.e., each p k PR violates exactly one CHSH inequality and each CHSH inequality is violated by exactly one p k PR .
with d = 3] (call this p QM ) violates the Shannon entropic BC inequality I 3 BC ≤ 0 when mixed with the classical distribution p =4 7 ⋃{p L,k } k }) and in this case, our results suggest that the non-classicality of the corresponding distributions cannot be detected through entropic inequalities as we now explain.Consider the relabelling of the distribution p QM (cf.Remark 1) that corresponds to flipping Bob's input, call this p ′ QM .Then p ′ QM is quantum achievable and violates I 4 BC ≤ 0 through mixing with p (2,2,3,3)1]), we found that for some values of u (e.g., u = 7 10), p mix (u) is non-classical and belongs to the polytope Conv(p(2,2,3,3) CG,i} i of p The distribution p CG,i is classical for all i if and only if ≤ 4 7.