Literature: RLVR in LLMs

Papers on reinforcement learning with verifiable rewards (RLVR) for language models, with particular attention to the limits of passive on-policy exploration. The failure of passive on-policy exploration to discover new reasoning strategies is one of the motivations of this project.

Scope includes:

empirical studies of what RLVR does and does not teach relative to the base model (e.g. "Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?");
exploration methods for RLVR, including representation-based and off-policy approaches (e.g. "Representation-Based Exploration for Language Models: From Test-Time to Post-Training");
related analyses of RLVR training dynamics, reward shaping, and strategy diversity collapse.

Papers

No entries yet.