MethodologyConceptsCvae-disentangling-notesCVAE Parameterization

CVAE Parameterization

Back to CVAE methodology

This note isolates the parameterization choices that are methodology-relevant rather than loss-specific.

General Parameterization

Let θ\theta denote the base model parameters. In the most general case, the three components have separate parameter sets:

ϕ,ψ,ξ.\phi, \qquad \psi, \qquad \xi.

Conceptually:

  • ϕ\phi parameterizes the strategy router pϕ(zx)p_\phi(z \mid x),
  • ψ\psi parameterizes the strategy-conditioned decoder pψ(yx,z)p_\psi(y \mid x, z),
  • ξ\xi parameterizes the inference network qξ(zx,y)q_\xi(z \mid x, y).

For small models, one valid implementation is to update all parameters of each component directly.

LoRA Parameterization

When parameter-efficient adaptation is preferred, each component may be written as a LoRA update of the base backbone:

ϕ=θ+δϕ,ψ=θ+δψ,ξ=θ+δξ.\phi = \theta + \delta_\phi, \qquad \psi = \theta + \delta_\psi, \qquad \xi = \theta + \delta_\xi.

Trainable parameters are then the low-rank adapters and small heads:

  • δϕ\delta_\phi and the router head,
  • δψ\delta_\psi and the decoder conditioning components,
  • δξ\delta_\xi and the inference head.

Backbone weights in θ\theta remain frozen.

Design knobs

Rank rr, scale α\alpha, dropout, and target modules are implementation knobs rather than method-defining commitments.

Unified Shared-Parameter Variant

For discrete latent strategies, one alternative is to share one parameter set between the router and the strategy-conditioned generator, while keeping a separate inference network.

Shared parameters:

ϕ=θ+δϕ.\phi = \theta + \delta_\phi.

Separate inference network:

ξ=θ+δξ.\xi = \theta + \delta_\xi.

Under this parameterization, the same causal LM can expose both a routing interface and a generation interface.

Next

Next: CVAE objective and loss terms

Built with LogoFlowershow