--- base_model: LiquidAI/LFM2.5-1.2B-Instruct library_name: transformers model_name: LFM2.5-1.2B-Instruct-Coding tags: - generated_from_trainer - grpo - trl - rlvr - sandbox - LoRA - peft licence: license datasets: - OpenCoder-LLM/opc-sft-stage2 --- # Model Card for LFM2.5-1.2B-Instruct-Coding This model is a fine-tuned version of [LiquidAI/LFM2.5-1.2B-Instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct). It has been trained using [TRL](https://github.com/huggingface/trl). 👉 **Model training codebase and sandbox implementation for RLVR:** https://github.com/rparkr/lfm-coder ## Quick start ```python from transformers import pipeline question = "Create a Python function that calculates average running speed and pace based on distance covered and time." generator = pipeline("text-generation", model="rparkr/LFM2.5-1.2B-Instruct-Coding", device="cuda") output = generator([{"role": "user", "content": question}], max_new_tokens=2048, return_full_text=False)[0] print(output["generated_text"]) ``` ## Training procedure This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300). It uses Reinforcement Learning with Verifiable Rewards using a Python sandbox to execute test suites from model-written code and calculate the reward based on passing tests. ### Framework versions - TRL: 1.3.0 - Transformers: 5.6.2 - Pytorch: 2.11.0 - Datasets: 4.8.5 - Tokenizers: 0.22.2 ## Citations Cite GRPO as: ```bibtex @article{shao2024deepseekmath, title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}}, author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo}, year = 2024, eprint = {arXiv:2402.03300}, } ``` Cite TRL as: ```bibtex @software{vonwerra2020trl, title = {{TRL: Transformers Reinforcement Learning}}, author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin}, license = {Apache-2.0}, url = {https://github.com/huggingface/trl}, year = {2020} } ```