--- license: mit pipeline_tag: text-generation ---

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Charlie Zhang, Graham Neubig, Xiang Yue Carnegie Mellon University, Language Technologies Institute
[![arXiv](https://img.shields.io/badge/arXiv-2512.07783-b31b1b.svg?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.07783) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) ![Python](https://img.shields.io/badge/python-3.9%2B-blue)
## Does Reinforcement Learning Truly Extend Reasoning? This work explores the discrepancy in views on RL's effectiveness in extending language models' reasoning abilities. Some characterize RL as a capability refiner, while others see it as inducing new compositional skills. This challenge stems from a lack of control in modern training pipelines. Our work aims to resolve this conflict through controlled analysis, going beyond the initial description that this repository contains mid-training related checkpoints in the extrapolation tasks. ## 🔍 Overview Our paper builds a fully controlled experimental framework to analyze how pre-training, mid-training, and RL-based post-training jointly shape the reasoning abilities of language models. Using synthetic math-style reasoning tasks with explicit atomic operations and process-verifiable reasoning traces, we study: * **Extrapolative generalization** to more complex compositions (deeper dependency graphs). * **Contextual generalization** across diverse surface forms and linguistic contexts. * How **RL interacts** with prior knowledge, and when it yields **genuine capability gains** beyond pre-training. ## 🧠 Key findings

You may also find the comic generated by Notebook LLM [here](assets/Interplay-LM-Reasoning.pdf). ## Code The code and data for this work will be released soon at the following GitHub repository: [https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning](https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning) ## 📚 Citation If you find this work or code useful, please consider citing: ```bibtex @misc{zhang2025interplaypretrainingmidtrainingrl, title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models}, author={Charlie Zhang and Graham Neubig and Xiang Yue}, year={2025}, eprint={2512.07783}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2512.07783}, } ```