Experiment-reportsSynthetic_sequencesDisentangledSynthetic-pretrained-disentangled-from-entangled-v1synthetic-pretrained-disentangled-from-entangled-v1 Design

synthetic-pretrained-disentangled-from-entangled-v1 Design

Back to collection landing page

Experiment Metadata

  • Collection: synthetic-pretrained-disentangled-from-entangled-v1
  • Run date: collection created 2026-04-14T01:15:35, last updated 2026-04-14T15:43:59
  • Collection definition: <.slurmkit/collections/synthetic-pretrained-disentangled-from-entangled-v1.yaml>
  • Source run directory: <.runs/synthetic_sequences/disentangled/synthetic-pretrained-disentangled-from-entangled-v1>
  • Collection commit ID: all 5 completed runs record git commit 017e5e4
  • Planned design: 6 tasks x 1 setting x 1 seed = 6 jobs
  • Observed completion at analysis time: 5/6 complete, 1/6 failed

Training Design

  • Backbone family: pretrained Qwen2.5-0.5B
  • Initialization path: start from pretrained-backed entangled source runs rather than directly from the base pretrained model
  • Entangled source collection: synthetic-pretrained-entangled-v1
  • Model source kind: entangled_run
  • Entangled adaptation mode: lora
  • Disentangling adaptation mode: lora
  • Latent setup: continuous, latent dim 8, last-token pooling
  • Training budget: 1 epoch, max_sequences_per_epoch=200000, seed 314159
  • Batching: batch_size=128, eval_batch_size=128
  • Optimization: adamw, lr=2e-4, beta=0.1
  • Reconstruction scaling: baseline-normalized reconstruction enabled
  • Diagnostic-only auxiliary terms: posterior_variance_weight=0.0, router_marginal_kl_to_prior_weight=0.0, router_support_weight=0.0, token_weighted_reconstruction_weight=0.0, inter_latent_divergence_weight=0.0

Analysis Scope

  • Quantitative summaries include the 5 completed tasks:
    • list_summation
    • grid_pathfinding
    • linear_equation_solving
    • base_conversion
    • multidigit_addition
  • Excluded task:
    • sorting_algorithms
    • exclusion reason: slurm-marked failure with CUDA OOM

Comparison Target

This collection is the entangled-initialized counterpart to ss-disentangled-direct-pretrained-v1.

The main design difference is initialization:

  • direct-pretrained: disentangling starts directly from the pretrained Qwen backbone
  • from-entangled: disentangling starts from a pretrained-backed model that was first fine-tuned on the mixture distribution

Because the normalized reconstruction baseline is inherited from a stronger source model here, reconstruction-scale comparisons against the direct-pretrained collection should be treated cautiously.

Built with LogoFlowershow