synthetic-pretrained-disentangled-from-entangled-v1 Analysis
synthetic-pretrained-disentangled-from-entangled-v1 Analysis
Back to collection landing page
Main Pattern
This collection still shows a separation between task success and latent-strategy alignment, but it is less uniformly weak than the direct-pretrained counterpart.
Across the 5 completed runs:
- sampled test accuracy is saturated at
1.0 - mean router probe accuracy is
0.500 - mean posterior-mu probe accuracy is
0.649
Relative to ss-disentangled-direct-pretrained-v1, the strongest shift is on the posterior side. On the shared 5 completed tasks, the from-entangled collection improves mean posterior probe by about +0.172, while mean router probe improves by a smaller +0.072. That suggests the entangled initialization may help the posterior organize strategy information more clearly than it helps the router recover the same structure at test time.
Task-Level Reading
The improvement is not uniform.
base_conversionis the clearest positive case. Router probe rises from about0.360in the direct-pretrained collection to0.769here, and posterior probe reaches1.000.linear_equation_solvingalso shows a large posterior-side gain, from about0.478to0.717, while router probe stays roughly flat.list_summation,grid_pathfinding, andmultidigit_additiondo not show a comparable router-side improvement, and in two of those tasks the router probe is slightly lower than in the direct-pretrained collection.
So the best cautious summary is not “initializing from the entangled model solves the problem,” but “it helps some tasks substantially, especially posterior-side structure, while leaving the router-side gap partly intact.”
Loss Interpretation
As in the direct-pretrained collection, the active objective here is still just baseline-normalized reconstruction plus 0.1 * KL prior. Posterior variance and router marginal KL remain diagnostics only because their weights are zero.
That matters for interpretation:
- the large posterior-variance values, roughly
388to2237 - and the large router-marginal-KL values, roughly
2434to20305
are not being optimized directly. They can therefore grow even while the actual optimized objective behaves cleanly.
Important Caveat On Reconstruction Scale
The normalized reconstruction metric is harder to interpret across collections here than within this collection.
In the direct-pretrained collection, reconstruction was normalized by a pretrained model that had not been mixture-fine-tuned. In this collection, disentangling starts from an entangled source run that is already much stronger on the mixture distribution. That makes baseline_normalized_reconstruction_loss a stricter reference. So the much larger normalized reconstruction values here should not be read naively as evidence that training is worse in an absolute sense.
Within this collection, the new normalized-reconstruction-versus-router-probe plot adds a useful descriptive pattern: lower final normalized reconstruction tends to coincide with higher router probe. Across the 5 completed tasks the correlation is strongly negative, about -0.97, but that pattern is driven heavily by base_conversion, which is the one task that both beats the entangled baseline very strongly (0.029) and shows the clearest router-side gain (0.769). The remaining four tasks cluster near normalized reconstruction around 1.0 and router probe around 0.39-0.48. So the plot is suggestive of a link between stronger latent-conditioned reconstruction and better router structure, but the sample is too small and too task-dependent to treat that as a firm general rule yet.
Updated Interpretation
This collection weakens the strongest version of the “pretrained backbone simply bypasses z” story, because initialization from the entangled source can produce much cleaner latent structure on at least some tasks. At the same time, it does not eliminate the broader objective-mismatch concern.
The most grounded reading is:
- entangled initialization can preserve or expose more strategy-relevant structure than direct-pretrained disentangling alone
- that benefit shows up more reliably in posterior probes than in router probes
- the router is still not consistently learning a clean strategy-coded latent across tasks under the current reconstruction-plus-KL objective
So this collection provides evidence that initialization matters, but not yet evidence that the representation problem is solved.