Experiment-reportsSynthetic_sequencesDisentangledSynthetic-pretrained-disentangled-from-entangled-v1synthetic-pretrained-disentangled-from-entangled-v1 Results

synthetic-pretrained-disentangled-from-entangled-v1 Results

Back to collection landing page

Final Snapshot

From <experiments/synthetic_sequences/disentangled/analysis/aggregates/synthetic-pretrained-disentangled-from-entangled-v1/data/final_snapshot.csv>:

  • completed runs: 5
  • failed runs: 1
  • mean sampled test final-answer accuracy: 1.0000
  • mean router-sampled probe accuracy: 0.5001
  • mean posterior-mu probe accuracy: 0.6491

Per-task final metrics:

TaskFinal accuracyRouter probePosterior probeNorm reconKL priorPosterior varianceRouter marginal KLTotal loss
list_summation1.00000.39060.43180.96520.0058409.502433.840.9658
grid_pathfinding1.00000.43510.46691.05990.0047452.513237.501.0603
linear_equation_solving1.00000.42870.71690.97190.00191592.6714631.870.9721
base_conversion1.00000.76901.00000.02881.4288387.872761.590.1717
multidigit_addition1.00000.47720.63001.02412.66382236.9120304.721.2905

Figures

Validation accuracy and alignment dynamics by task. Caption: Validation final-answer accuracy, router probe accuracy, and posterior-mu probe accuracy by task.

Optimized objective terms by task. Caption: Baseline-normalized reconstruction, KL prior, and total objective by task.

Unweighted diagnostics by task. Caption: Posterior-variance and router-marginal-KL diagnostics by task.

Standardized comparison of objective versus unweighted diagnostics. Caption: Standardized trajectories comparing optimized objective terms against unweighted diagnostics.

Final accuracy versus router probe across tasks. Caption: Final answer accuracy versus router probe across completed tasks.

Normalized reconstruction versus router probe across tasks. Caption: Final baseline-normalized reconstruction loss versus router probe across completed tasks, colored by posterior probe accuracy.

Built with LogoFlowershow