CVAE Model Components
CVAE Model Components
This note collects the router, decoder, and inference-network interfaces for both discrete and continuous latent variants.
Discrete Latent Variant
Router
Run the base model on the sequence only. Extract one pooled sequence embedding from the final layer. Apply a learned prediction head to produce router logits and probabilities. The head is always trainable; the backbone may be frozen, LoRA-adapted, or fully updated.
Strategy-conditioned decoder
For discrete , conditioning is implemented by injecting a learned strategy token or token-like conditioning slot.
<bos> <x> {x} </x> <z> {z} </z> <y> {y} </y> <eos>
Inference network
Run the base model on the full sequence. Pool the final-layer hidden states into one sequence representation. Use a learned head to produce posterior logits over latent strategies.
Continuous Latent Variant
Router
Strategy-conditioned decoder
For continuous , the decoder uses a projected latent feature injection such as prefix conditioning or adapter-side conditioning.
Sample or choose a latent vector . Project it into model dimension as . Insert at a dedicated latent-conditioning position in the decoder input stream.
Inference network
Interface Summary
- Router: chooses or parameterizes strategy latents from .
- Decoder: generates from .
- Inference model: approximates the posterior over from .
The shared point across both variants is that the latent is meant to control high-level strategy, not merely local token noise.