CVAE Objective and Loss Terms

This note centralizes the loss-level notes that were already natural atomic units in the original methodology thread.

Core Objective

The base training picture is the conditional ELBO:

\mathcal{L}_{\mathrm{ELBO}} = \mathbb{E}_{z \sim q_\xi(z \mid x, y)}[\log p_\psi(y \mid x, z)] - \beta\, \mathrm{KL}\!\left(q_\xi(z \mid x, y)\,\|\,p_\phi(z \mid x)\right).

The project’s methodology notes mainly study why this objective is under-specified in the entangled-initialized regime, and how to alter it so that $z$ carries strategy semantics rather than collapsing.

Atomic Loss Notes

Baseline-normalized reconstruction Uses the entangled baseline loss to restore reconstruction scale.
Token-weighted excess reconstruction Focuses reconstruction on strategy-sensitive teacher-forced positions.
Inter-latent divergence Adds an exclusivity signal so different latents induce different predictive distributions.
InfoVAE connections Reframes the ELBO through the mutual-information / marginal-matching lens.
Sequence-level divergence Records a harder full-sequence divergence direction and why it remains deferred.

How To Read This Cluster

Recommended order

Start with problem setup and model components. Then read token-weighted excess reconstruction and baseline-normalized reconstruction. Read inter-latent divergence and InfoVAE connections as the main exclusivity / information-flow add-ons.