synthetic-pretrained-disentangled-from-entangled-v1 Results
synthetic-pretrained-disentangled-from-entangled-v1 Results
Back to collection landing page
Final Snapshot
From <experiments/synthetic_sequences/disentangled/analysis/aggregates/synthetic-pretrained-disentangled-from-entangled-v1/data/final_snapshot.csv>:
- completed runs:
5 - failed runs:
1 - mean sampled test final-answer accuracy:
1.0000 - mean router-sampled probe accuracy:
0.5001 - mean posterior-mu probe accuracy:
0.6491
Per-task final metrics:
| Task | Final accuracy | Router probe | Posterior probe | Norm recon | KL prior | Posterior variance | Router marginal KL | Total loss |
|---|---|---|---|---|---|---|---|---|
list_summation | 1.0000 | 0.3906 | 0.4318 | 0.9652 | 0.0058 | 409.50 | 2433.84 | 0.9658 |
grid_pathfinding | 1.0000 | 0.4351 | 0.4669 | 1.0599 | 0.0047 | 452.51 | 3237.50 | 1.0603 |
linear_equation_solving | 1.0000 | 0.4287 | 0.7169 | 0.9719 | 0.0019 | 1592.67 | 14631.87 | 0.9721 |
base_conversion | 1.0000 | 0.7690 | 1.0000 | 0.0288 | 1.4288 | 387.87 | 2761.59 | 0.1717 |
multidigit_addition | 1.0000 | 0.4772 | 0.6300 | 1.0241 | 2.6638 | 2236.91 | 20304.72 | 1.2905 |
Figures
Caption: Validation final-answer accuracy, router probe accuracy, and posterior-mu probe accuracy by task.
Caption: Baseline-normalized reconstruction, KL prior, and total objective by task.
Caption: Posterior-variance and router-marginal-KL diagnostics by task.
Caption: Standardized trajectories comparing optimized objective terms against unweighted diagnostics.
Caption: Final answer accuracy versus router probe across completed tasks.
Caption: Final baseline-normalized reconstruction loss versus router probe across completed tasks, colored by posterior probe accuracy.