2025-07-17 Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved) Researchers: Chongli Qin, Jost Tobias Springenberg Links: arXiv paper blog post github - LLM codebase github - Control Suite codebase. 32B-reasoning-model. Contact: contact@independentresearch.ai