synthetic-pretrained-disentangled-from-entangled-v1 Analysis

Main Pattern

This collection still shows a separation between task success and latent-strategy alignment, but it is less uniformly weak than the direct-pretrained counterpart.

Across the 5 completed runs:

sampled test accuracy is saturated at 1.0
mean router probe accuracy is 0.500
mean posterior-mu probe accuracy is 0.649

Relative to ss-disentangled-direct-pretrained-v1, the strongest shift is on the posterior side. On the shared 5 completed tasks, the from-entangled collection improves mean posterior probe by about +0.172, while mean router probe improves by a smaller +0.072. That suggests the entangled initialization may help the posterior organize strategy information more clearly than it helps the router recover the same structure at test time.

Task-Level Reading

The improvement is not uniform.

base_conversion is the clearest positive case. Router probe rises from about 0.360 in the direct-pretrained collection to 0.769 here, and posterior probe reaches 1.000.
linear_equation_solving also shows a large posterior-side gain, from about 0.478 to 0.717, while router probe stays roughly flat.
list_summation, grid_pathfinding, and multidigit_addition do not show a comparable router-side improvement, and in two of those tasks the router probe is slightly lower than in the direct-pretrained collection.

So the best cautious summary is not “initializing from the entangled model solves the problem,” but “it helps some tasks substantially, especially posterior-side structure, while leaving the router-side gap partly intact.”

Loss Interpretation

As in the direct-pretrained collection, the active objective here is still just baseline-normalized reconstruction plus 0.1 * KL prior. Posterior variance and router marginal KL remain diagnostics only because their weights are zero.

That matters for interpretation:

the large posterior-variance values, roughly 388 to 2237
and the large router-marginal-KL values, roughly 2434 to 20305

are not being optimized directly. They can therefore grow even while the actual optimized objective behaves cleanly.

Important Caveat On Reconstruction Scale

The normalized reconstruction metric is harder to interpret across collections here than within this collection.

In the direct-pretrained collection, reconstruction was normalized by a pretrained model that had not been mixture-fine-tuned. In this collection, disentangling starts from an entangled source run that is already much stronger on the mixture distribution. That makes baseline_normalized_reconstruction_loss a stricter reference. So the much larger normalized reconstruction values here should not be read naively as evidence that training is worse in an absolute sense.

Within this collection, the new normalized-reconstruction-versus-router-probe plot adds a useful descriptive pattern: lower final normalized reconstruction tends to coincide with higher router probe. Across the 5 completed tasks the correlation is strongly negative, about -0.97, but that pattern is driven heavily by base_conversion, which is the one task that both beats the entangled baseline very strongly (0.029) and shows the clearest router-side gain (0.769). The remaining four tasks cluster near normalized reconstruction around 1.0 and router probe around 0.39-0.48. So the plot is suggestive of a link between stronger latent-conditioned reconstruction and better router structure, but the sample is too small and too task-dependent to treat that as a firm general rule yet.

Updated Interpretation

This collection weakens the strongest version of the “pretrained backbone simply bypasses z” story, because initialization from the entangled source can produce much cleaner latent structure on at least some tasks. At the same time, it does not eliminate the broader objective-mismatch concern.

The most grounded reading is:

entangled initialization can preserve or expose more strategy-relevant structure than direct-pretrained disentangling alone
that benefit shows up more reliably in posterior probes than in router probes
the router is still not consistently learning a clean strategy-coded latent across tasks under the current reconstruction-plus-KL objective

So this collection provides evidence that initialization matters, but not yet evidence that the representation problem is solved.