synthetic-pretrained-disentangled-from-entangled-v1 Design
synthetic-pretrained-disentangled-from-entangled-v1 Design
Back to collection landing page
Experiment Metadata
- Collection:
synthetic-pretrained-disentangled-from-entangled-v1 - Run date: collection created
2026-04-14T01:15:35, last updated2026-04-14T15:43:59 - Collection definition:
<.slurmkit/collections/synthetic-pretrained-disentangled-from-entangled-v1.yaml> - Source run directory:
<.runs/synthetic_sequences/disentangled/synthetic-pretrained-disentangled-from-entangled-v1> - Collection commit ID: all 5 completed runs record git commit
017e5e4 - Planned design:
6 tasks x 1 setting x 1 seed = 6jobs - Observed completion at analysis time:
5/6complete,1/6failed
Training Design
- Backbone family: pretrained
Qwen2.5-0.5B - Initialization path: start from pretrained-backed entangled source runs rather than directly from the base pretrained model
- Entangled source collection:
synthetic-pretrained-entangled-v1 - Model source kind:
entangled_run - Entangled adaptation mode:
lora - Disentangling adaptation mode:
lora - Latent setup:
continuous, latent dim8, last-token pooling - Training budget:
1epoch,max_sequences_per_epoch=200000, seed314159 - Batching:
batch_size=128,eval_batch_size=128 - Optimization:
adamw,lr=2e-4,beta=0.1 - Reconstruction scaling: baseline-normalized reconstruction enabled
- Diagnostic-only auxiliary terms:
posterior_variance_weight=0.0,router_marginal_kl_to_prior_weight=0.0,router_support_weight=0.0,token_weighted_reconstruction_weight=0.0,inter_latent_divergence_weight=0.0
Analysis Scope
- Quantitative summaries include the 5 completed tasks:
list_summationgrid_pathfindinglinear_equation_solvingbase_conversionmultidigit_addition
- Excluded task:
sorting_algorithms- exclusion reason: slurm-marked failure with CUDA OOM
Comparison Target
This collection is the entangled-initialized counterpart to ss-disentangled-direct-pretrained-v1.
The main design difference is initialization:
- direct-pretrained: disentangling starts directly from the pretrained Qwen backbone
- from-entangled: disentangling starts from a pretrained-backed model that was first fine-tuned on the mixture distribution
Because the normalized reconstruction baseline is inherited from a stronger source model here, reconstruction-scale comparisons against the direct-pretrained collection should be treated cautiously.