synthetic-pretrained-disentangled-from-entangled-v1 Design

Back to collection landing page

Experiment Metadata

Collection: synthetic-pretrained-disentangled-from-entangled-v1
Run date: collection created 2026-04-14T01:15:35, last updated 2026-04-14T15:43:59
Collection definition: <.slurmkit/collections/synthetic-pretrained-disentangled-from-entangled-v1.yaml>
Source run directory: <.runs/synthetic_sequences/disentangled/synthetic-pretrained-disentangled-from-entangled-v1>
Collection commit ID: all 5 completed runs record git commit 017e5e4
Planned design: 6 tasks x 1 setting x 1 seed = 6 jobs
Observed completion at analysis time: 5/6 complete, 1/6 failed

Training Design

Backbone family: pretrained Qwen2.5-0.5B
Initialization path: start from pretrained-backed entangled source runs rather than directly from the base pretrained model
Entangled source collection: synthetic-pretrained-entangled-v1
Model source kind: entangled_run
Entangled adaptation mode: lora
Disentangling adaptation mode: lora
Latent setup: continuous, latent dim 8, last-token pooling
Training budget: 1 epoch, max_sequences_per_epoch=200000, seed 314159
Batching: batch_size=128, eval_batch_size=128
Optimization: adamw, lr=2e-4, beta=0.1
Reconstruction scaling: baseline-normalized reconstruction enabled
Diagnostic-only auxiliary terms: posterior_variance_weight=0.0, router_marginal_kl_to_prior_weight=0.0, router_support_weight=0.0, token_weighted_reconstruction_weight=0.0, inter_latent_divergence_weight=0.0

Analysis Scope

Quantitative summaries include the 5 completed tasks:
- list_summation
- grid_pathfinding
- linear_equation_solving
- base_conversion
- multidigit_addition
Excluded task:
- sorting_algorithms
- exclusion reason: slurm-marked failure with CUDA OOM

Comparison Target

This collection is the entangled-initialized counterpart to ss-disentangled-direct-pretrained-v1.

The main design difference is initialization:

direct-pretrained: disentangling starts directly from the pretrained Qwen backbone
from-entangled: disentangling starts from a pretrained-backed model that was first fine-tuned on the mixture distribution

Because the normalized reconstruction baseline is inherited from a stronger source model here, reconstruction-scale comparisons against the direct-pretrained collection should be treated cautiously.