CVAE Model Components

Back to CVAE methodology

This note collects the router, decoder, and inference-network interfaces for both discrete and continuous latent variants.

Discrete Latent Variant

Router

p_\phi(z \mid x) = \mathrm{Cat}(\pi_\phi(x)), \qquad \pi_\phi(x) \in \Delta^{K-1}.

Implementation sketch

Run the base model on the $x$ sequence only. Extract one pooled sequence embedding $h_x$ from the final layer. Apply a learned prediction head to produce router logits and probabilities. The head is always trainable; the backbone may be frozen, LoRA-adapted, or fully updated.

Strategy-conditioned decoder

p_\psi(y \mid x, z) = \prod_{t=1}^{T} p_\psi(y_t \mid x, z, y_{<t}).

For discrete $z$ , conditioning is implemented by injecting a learned strategy token or token-like conditioning slot.

Discrete conditioning template

<bos> <x> {x} </x> <z> {z} </z> <y> {y} </y> <eos>

Inference network

q_\xi(z \mid x, y) = \mathrm{Cat}(\pi_\xi(x, y)).

Implementation sketch

Run the base model on the full $(x, y)$ sequence. Pool the final-layer hidden states into one sequence representation. Use a learned head to produce posterior logits over latent strategies.

Continuous Latent Variant

Router

p_\phi(z \mid x) = \mathcal{N}(\mu_\phi(x), \operatorname{diag}(\sigma_\phi(x)^2)).

Strategy-conditioned decoder

For continuous $z$ , the decoder uses a projected latent feature injection such as prefix conditioning or adapter-side conditioning.

Continuous conditioning sketch

Sample or choose a latent vector $z$ . Project it into model dimension as $h_z$ . Insert $h_z$ at a dedicated latent-conditioning position in the decoder input stream.

Inference network

q_\xi(z \mid x, y) = \mathcal{N}(\mu_\xi(x, y), \operatorname{diag}(\sigma_\xi(x, y)^2)).

Interface Summary

Router: chooses or parameterizes strategy latents from $x$ .
Decoder: generates $y$ from $(x, z)$ .
Inference model: approximates the posterior over $z$ from $(x, y)$ .

The shared point across both variants is that the latent is meant to control high-level strategy, not merely local token noise.

Next: CVAE parameterization