Literature: RLVR in LLMs
Literature: RLVR in LLMs
Papers on reinforcement learning with verifiable rewards (RLVR) for language models, with particular attention to the limits of passive on-policy exploration. The failure of passive on-policy exploration to discover new reasoning strategies is one of the motivations of this project.
Scope includes:
- empirical studies of what RLVR does and does not teach relative to the base model (e.g. "Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?");
- exploration methods for RLVR, including representation-based and off-policy approaches (e.g. "Representation-Based Exploration for Language Models: From Test-Time to Post-Training");
- related analyses of RLVR training dynamics, reward shaping, and strategy diversity collapse.
Papers
No entries yet.