Discrete Latent VAEs
Discrete Latent VAEs
This note records the training-specific considerations that arise when the latent variable is discrete rather than continuous.
Related notes:
A discrete-state VAE uses categorical latent variables instead of Gaussian latents.
Model and parameterization
Given data or conditional context :
- prior: , often uniform categorical, or conditional prior
- encoder / inference model: logits with
- decoder: or , with provided through an embedding or other conditioning interface
ELBO objective
The negative ELBO is
In the conditional case, replace with a conditional prior such as and condition the decoder on as well.
KL for categorical latents
For categorical and with probabilities and ,
This term is analytic, so no sampling is needed for the KL itself.
Reconstruction expectation
The main practical issue is the reconstruction expectation
Two main regimes matter:
- Exact marginalization for small : This is low variance, unbiased, and differentiable.
- Monte Carlo for large : This is straightforward for decoder parameters but raises the usual discrete-gradient issue for the encoder.
Differentiating through discrete sampling
Sampling a categorical latent is not pathwise differentiable. Common options are:
- REINFORCE / score-function estimators: often with a baseline for variance reduction
- Gumbel-Softmax / Concrete relaxations: use a soft one-hot approximation in the backward pass
- control-variate estimators such as VIMCO, RELAX, or REBAR: more complex, but potentially lower variance
For small or moderate , prefer exact marginalization. For larger or structured discrete latent spaces, a Gumbel-Softmax-style relaxation is often the simplest practical choice.
Why this note matters here
The main methodology cluster already covers the discrete-latent interfaces. What this note adds is the training-specific reminder that discrete latents change the reconstruction expectation and gradient-estimation story, even when the high-level CVAE factorization looks the same.