Exp1 Gaussian Mixture: Larger Beta Sweep (Concise Report)
Exp1 Gaussian Mixture: Larger Beta Sweep (Concise Report)
Exp1 Gaussian Mixture: Larger Beta Sweep (Concise Report)
Artifact source: path:experiments/gaussian_mixture/demos/artifacts/exp1_gaussian_mixture_high_density-larger-beta-sweep
Methodology
Notation
- is an observed sample and is a latent mode.
- denotes expectation, denotes KL divergence, and denotes mean squared error.
- denotes a Gaussian distribution and denotes a uniform categorical prior.
Synthetic data is sampled from a labeled Gaussian mixture: The larger sweep uses , , , and
Model definitions (priors + objectives)
Both variants optimize
-
Discrete latent VAE
- Posterior:
- Prior: uniform categorical
- Implementation note: training uses exact softmax-weighted expectation over categories (no Gumbel-softmax sampling).
- Objective:
- Reported history/summary metrics:
loss,recon,kl,nmi,clustering_accuracy.
-
Continuous latent VAE
- Posterior:
- Prior: standard Gaussian
- Objective:
- Reported history/summary metrics:
loss,recon,kl,linear_probe_accuracy.
Metric definitions
recon: mean squared reconstruction error.kl: KL divergence term in the objective.nmi: normalized mutual information between predicted and true modes (discrete).clustering_accuracy: permutation-invariant label alignment accuracylinear_probe_accuracy: holdout linear classifier accuracy from continuous latent means to true modes.
Experimental Setup
| Category | Parameter | Value |
|---|---|---|
| Dataset | Components () | 3 |
| Dataset | Dimension () | 2 |
| Dataset | Samples per sweep point () | 240 |
| Dataset | Component std | 0.7 |
| Dataset | Component separation | 4.0 |
| Dataset | Mixture weights | None (uniform default) |
| Training | Epochs | 20 |
| Training | Batch size | 64 |
| Training | Hidden dim | 32 |
| Training | Latent dim (continuous) | 2 |
| Training | Optimizer | Adam |
| Training | Learning rate | |
| Sweep | grid | |
| Reproducibility | Seed | 17 |
| Reproducibility | Device | cpu |
| Outputs | Artifact root | path:experiments/gaussian_mixture/demos/artifacts/exp1_gaussian_mixture_high_density-larger-beta-sweep |
Quantitative results
Discrete latent summary
| NMI | Acc | Recon | KL final | |
|---|---|---|---|---|
| 0.10 | 0.762 | 0.729 | 6.488 | 1.020 |
| 0.25 | 0.762 | 0.729 | 6.548 | 0.895 |
| 0.50 | 0.735 | 0.717 | 6.679 | 0.670 |
| 1.00 | 0.720 | 0.704 | 7.103 | 0.359 |
| 2.00 | 0.720 | 0.704 | 7.788 | 0.035 |
| 5.00 | 0.690 | 0.700 | 7.901 | 0.001 |
Continuous latent summary
| Probe Acc | Recon | KL final | Mean radius | |
|---|---|---|---|---|
| 0.10 | 1.000 | 3.272 | 5.125 | 2.935 |
| 0.25 | 1.000 | 3.521 | 3.482 | 2.451 |
| 0.50 | 1.000 | 3.909 | 2.477 | 2.062 |
| 1.00 | 1.000 | 4.895 | 1.322 | 1.533 |
| 2.00 | 1.000 | 6.505 | 0.421 | 0.875 |
| 5.00 | 0.979 | 7.827 | 0.047 | 0.271 |
Figures
Caption: Discrete training history metrics across beta.
Caption: Continuous training history metrics across beta.
Caption: Discrete metric heatmap and confusion matrices.
Caption: Reconstruction vs KL tradeoff across latent type and beta.
Key takeaways
- Discrete mode recovery (NMI/accuracy) is strongest at lower beta, while KL pressure increases reconstruction cost as beta grows.
- Continuous latents remain highly linearly predictive of true modes across the sweep (probe accuracy ~1.0 except slight drop at beta=5.0).
- Larger beta compresses continuous latent geometry (mean radius drops from 2.935 to 0.271), consistent with stronger prior matching.