Experiment-reportsGaussian_mixtureExp1_gaussian_mixture_high_density-larger-beta-sweepExp1 Gaussian Mixture: Larger Beta Sweep (Concise Report)

Exp1 Gaussian Mixture: Larger Beta Sweep (Concise Report)

Exp1 Gaussian Mixture: Larger Beta Sweep (Concise Report)

Artifact source: path:experiments/gaussian_mixture/demos/artifacts/exp1_gaussian_mixture_high_density-larger-beta-sweep

Methodology

Notation

  • xRdx \in \mathbb{R}^d is an observed sample and z{1,,K}z \in \{1, \ldots, K\} is a latent mode.
  • E\mathbb{E} denotes expectation, KL\mathrm{KL} denotes KL divergence, and MSE\mathrm{MSE} denotes mean squared error.
  • N\mathcal{N} denotes a Gaussian distribution and Unif\mathrm{Unif} denotes a uniform categorical prior.

Synthetic data is sampled from a labeled Gaussian mixture: zCat(π),xz=kN(μk,σ2Id),xRd.z \sim \mathrm{Cat}(\pi), \quad x \mid z = k \sim \mathcal{N}(\mu_k, \sigma^2 I_d), \quad x \in \mathbb{R}^d. The larger sweep uses K=3K=3, d=2d=2, N=240N=240, and β{0.10,0.25,0.50,1.00,2.00,5.00}.\beta \in \{0.10, 0.25, 0.50, 1.00, 2.00, 5.00\}.

Model definitions (priors + objectives)

Both variants optimize L(x)=Lrecon(x)+βLkl(x).L(x) = L_\mathrm{recon}(x) + \beta L_\mathrm{kl} (x).

  • Discrete latent VAE

    • Posterior: qϕ(z=kx)=softmax(fϕ(x))k,k{1,,K},q_\phi(z = k \mid x) = \operatorname{softmax}(f_\phi(x))_k, \quad k \in \{1, \ldots, K\},
    • Prior: uniform categorical p(z)=Unif([K]).p(z) = \mathrm{Unif}([K]).
    • Implementation note: training uses exact softmax-weighted expectation over categories (no Gumbel-softmax sampling).
    • Objective: Ldisc(x)=k=1Kqϕ(z=kx)MSE(x,x^k)+βKL(qϕ(zx)p(z)).L_\mathrm{disc}(x) = \sum_{k=1}^{K} q_\phi(z = k \mid x)\, \mathrm{MSE}(x, \hat{x}_k) + \beta \mathrm{KL}(q_\phi(z \mid x) \,\|\, p(z)).
    • Reported history/summary metrics: loss, recon, kl, nmi, clustering_accuracy.
  • Continuous latent VAE

    • Posterior: qϕ(zx)=N(μϕ(x),diag(σϕ(x)2)).q_\phi(z \mid x) = \mathcal{N}(\mu_\phi(x), \operatorname{diag}(\sigma_\phi(x)^2)).
    • Prior: standard Gaussian p(z)=N(0,I).p(z) = \mathcal{N}(0, I).
    • Objective: Lcont(x)=Eqϕ(zx)[MSE(x,x^(z))]+βKL(qϕ(zx)p(z)).L_\mathrm{cont}(x) = \mathbb{E}_{q_\phi(z \mid x)}[\mathrm{MSE}(x, \hat{x}(z))] + \beta \mathrm{KL}(q_\phi(z \mid x) \,\|\, p(z)).
    • Reported history/summary metrics: loss, recon, kl, linear_probe_accuracy.

Metric definitions

  • recon: mean squared reconstruction error.
  • kl: KL divergence term in the objective.
  • nmi: normalized mutual information between predicted and true modes (discrete).
  • clustering_accuracy: permutation-invariant label alignment accuracy maxπSK1Ni1[π(z^i)=zi].\max_{\pi \in S_K} \frac{1}{N} \sum_{i} \mathbf{1}[\pi(\hat{z}_i) = z_i].
  • linear_probe_accuracy: holdout linear classifier accuracy from continuous latent means to true modes.

Experimental Setup

CategoryParameterValue
DatasetComponents (KK)3
DatasetDimension (dd)2
DatasetSamples per sweep point (NN)240
DatasetComponent std0.7
DatasetComponent separation4.0
DatasetMixture weightsNone (uniform default)
TrainingEpochs20
TrainingBatch size64
TrainingHidden dim32
TrainingLatent dim (continuous)2
TrainingOptimizerAdam
TrainingLearning rate10310^{-3}
Sweepβ\beta grid{0.10,0.25,0.50,1.00,2.00,5.00}\{0.10, 0.25, 0.50, 1.00, 2.00, 5.00\}
ReproducibilitySeed17
ReproducibilityDevicecpu
OutputsArtifact rootpath:experiments/gaussian_mixture/demos/artifacts/exp1_gaussian_mixture_high_density-larger-beta-sweep

Quantitative results

Discrete latent summary

β\betaNMIAccReconKL final
0.100.7620.7296.4881.020
0.250.7620.7296.5480.895
0.500.7350.7176.6790.670
1.000.7200.7047.1030.359
2.000.7200.7047.7880.035
5.000.6900.7007.9010.001

Continuous latent summary

β\betaProbe AccReconKL finalMean radius
0.101.0003.2725.1252.935
0.251.0003.5213.4822.451
0.501.0003.9092.4772.062
1.001.0004.8951.3221.533
2.001.0006.5050.4210.875
5.000.9797.8270.0470.271

Figures

Discrete training history metrics across beta. Caption: Discrete training history metrics across beta.

Continuous training history metrics across beta. Caption: Continuous training history metrics across beta.

Discrete metric heatmap and confusion matrices. Caption: Discrete metric heatmap and confusion matrices.

Reconstruction versus KL tradeoff across latent type and beta. Caption: Reconstruction vs KL tradeoff across latent type and beta.

Key takeaways

  • Discrete mode recovery (NMI/accuracy) is strongest at lower beta, while KL pressure increases reconstruction cost as beta grows.
  • Continuous latents remain highly linearly predictive of true modes across the sweep (probe accuracy ~1.0 except slight drop at beta=5.0).
  • Larger beta compresses continuous latent geometry (mean radius drops from 2.935 to 0.271), consistent with stronger prior matching.
Built with LogoFlowershow