CVAE Parameterization

This note isolates the parameterization choices that are methodology-relevant rather than loss-specific.

General Parameterization

Let $\theta$ denote the base model parameters. In the most general case, the three components have separate parameter sets:

\phi, \qquad \psi, \qquad \xi.

Conceptually:

$\phi$ parameterizes the strategy router $p_\phi(z \mid x)$ ,
$\psi$ parameterizes the strategy-conditioned decoder $p_\psi(y \mid x, z)$ ,
$\xi$ parameterizes the inference network $q_\xi(z \mid x, y)$ .

For small models, one valid implementation is to update all parameters of each component directly.

LoRA Parameterization

When parameter-efficient adaptation is preferred, each component may be written as a LoRA update of the base backbone:

\phi = \theta + \delta_\phi, \qquad \psi = \theta + \delta_\psi, \qquad \xi = \theta + \delta_\xi.

Trainable parameters are then the low-rank adapters and small heads:

$\delta_\phi$ and the router head,
$\delta_\psi$ and the decoder conditioning components,
$\delta_\xi$ and the inference head.

Backbone weights in $\theta$ remain frozen.

Design knobs

Rank $r$ , scale $\alpha$ , dropout, and target modules are implementation knobs rather than method-defining commitments.

Unified Shared-Parameter Variant

For discrete latent strategies, one alternative is to share one parameter set between the router and the strategy-conditioned generator, while keeping a separate inference network.

Shared parameters:

\phi = \theta + \delta_\phi.

Separate inference network:

\xi = \theta + \delta_\xi.

Under this parameterization, the same causal LM can expose both a routing interface and a generation interface.

Next: CVAE objective and loss terms