Toy Model: Mixture over Deterministic Conditional Distributions

This note records the next toy-model direction in the strategy-identification thread. It is still a seed note, but it fills in the missing 03 step in the sequence.

Related notes:

Status

This is still a placeholder-level theory note. The main value right now is to pin down the intended toy model and the identification questions it is meant to answer.

Setup

Formalize a toy model where the conditional law $p(y \mid x)$ is a mixture over deterministic conditional distributions. That is, there exists a discrete latent strategy space $\mathcal{Z}$ , a router $r(z \mid x)$ , and a deterministic function

f : \mathcal{X} \times \mathcal{Z} \to \mathcal{Y}

such that

p(y \mid x) = \sum_{z \in \mathcal{Z}} r(z \mid x)\, \mathbf{1}[f(x, z) = y].

For a first pass, we may simplify further and allow $r(z \mid x) = r(z)$ .

In this setting:

the router latent space is a discrete set of strategies;
the generator is deterministic given the input and strategy;
the model mirrors the synthetic data setting used in the experiments.

Oracle identification question

Consider an oracle estimator or algorithm that knows the distribution $p(y \mid x)$ and tries to identify the underlying strategies by solving a constrained factorization problem such as

\max_{r, g} \left\{ I(z; y) : p(y \mid x) = \sum_{z} r(z \mid x)\, g(y \mid x, z) \right\}.

The main questions are:

Does this estimator recover the true underlying strategies, in a structural-identifiability sense up to permutation or another natural equivalence?
Under what conditions does identifiability depend on the properties of $f$ and $r$ , or on the allowed router and generator classes?

Connection to the learning objective

If the oracle factorization is well-defined, the next question is whether the actual training objective, such as a $\beta$ -ELBO-style objective, recovers the same solution or a nearby one.

That leads to questions like:

When does the optimization objective select the same factorization as the oracle estimator?
How many samples are needed to recover the intended strategies?
What notion of closeness to the true strategies can be guaranteed as a function of sample size, the properties of $f$ and $r$ , and the objective parameters such as $\beta$ ?

Intended use

This toy model is useful because it is simple enough to analyze while still matching the synthetic-strategy setting more closely than the earlier generic mixture notes.

Next: Return to strategy identification index