MethodologyConceptsCvae-disentangling-notesCVAE Model Components

CVAE Model Components

Back to CVAE methodology

This note collects the router, decoder, and inference-network interfaces for both discrete and continuous latent variants.

Discrete Latent Variant

Router

pϕ(zx)=Cat(πϕ(x)),πϕ(x)ΔK1.p_\phi(z \mid x) = \mathrm{Cat}(\pi_\phi(x)), \qquad \pi_\phi(x) \in \Delta^{K-1}.
Implementation sketch

Run the base model on the xx sequence only. Extract one pooled sequence embedding hxh_x from the final layer. Apply a learned prediction head to produce router logits and probabilities. The head is always trainable; the backbone may be frozen, LoRA-adapted, or fully updated.

Strategy-conditioned decoder

pψ(yx,z)=t=1Tpψ(ytx,z,y<t).p_\psi(y \mid x, z) = \prod_{t=1}^{T} p_\psi(y_t \mid x, z, y_{<t}).

For discrete zz, conditioning is implemented by injecting a learned strategy token or token-like conditioning slot.

Discrete conditioning template

<bos> <x> {x} </x> <z> {z} </z> <y> {y} </y> <eos>

Inference network

qξ(zx,y)=Cat(πξ(x,y)).q_\xi(z \mid x, y) = \mathrm{Cat}(\pi_\xi(x, y)).
Implementation sketch

Run the base model on the full (x,y)(x, y) sequence. Pool the final-layer hidden states into one sequence representation. Use a learned head to produce posterior logits over latent strategies.

Continuous Latent Variant

Router

pϕ(zx)=N(μϕ(x),diag(σϕ(x)2)).p_\phi(z \mid x) = \mathcal{N}(\mu_\phi(x), \operatorname{diag}(\sigma_\phi(x)^2)).

Strategy-conditioned decoder

For continuous zz, the decoder uses a projected latent feature injection such as prefix conditioning or adapter-side conditioning.

Continuous conditioning sketch

Sample or choose a latent vector zz. Project it into model dimension as hzh_z. Insert hzh_z at a dedicated latent-conditioning position in the decoder input stream.

Inference network

qξ(zx,y)=N(μξ(x,y),diag(σξ(x,y)2)).q_\xi(z \mid x, y) = \mathcal{N}(\mu_\xi(x, y), \operatorname{diag}(\sigma_\xi(x, y)^2)).

Interface Summary

  • Router: chooses or parameterizes strategy latents from xx.
  • Decoder: generates yy from (x,z)(x, z).
  • Inference model: approximates the posterior over zz from (x,y)(x, y).

The shared point across both variants is that the latent is meant to control high-level strategy, not merely local token noise.

Next

Next: CVAE parameterization

Built with LogoFlowershow