OpenCausaLab
/

CauGym

+---
+license: apache-2.0
+library_name: transformers
+base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
+pipeline_tag: text-generation
+tags:
+- causal-inference
+- causal-reasoning
+- reinforcement-learning
+- grpo
+---
+# Can Post-Training Transform LLMs into Causal Reasoners?
+This repository contains the **CauGym-GRPO-14B** model, a causal inference agent developed through targeted post-training of a 14B parameter LLM.
+- **Paper:** [Can Post-Training Transform LLMs into Causal Reasoners?](https://huggingface.co/papers/2602.06337)
+- **Repository:** [GitHub - OpenCausaLab/CauGym](https://github.com/OpenCausaLab/CauGym)
+## Model Description
+The model was fine-tuned using **Group Relative Policy Optimization (GRPO)** on the CauGym dataset, which covers seven core causal inference tasks across interventional and counterfactual domains. The research demonstrates that targeted post-training enables smaller models to perform competitively with or even surpass much larger counterparts on complex causal tasks.
+### Key Features
+- **Backbone:** DeepSeek-R1-Distill-Qwen-14B.
+- **High Performance:** Achieves 93.5% accuracy on the CaLM benchmark, compared to 55.4% by OpenAI o3.
+- **Robustness:** Exhibits strong generalization under real-world conditions, such as distribution shifts and noisy data.
+- **Internalization:** Capable of independently recognizing and applying causal theorems like the Backdoor Criterion.
+## Citation
+If you use the CauGym dataset or reference this research, please cite:
+```bibtex
+@misc{chen2026posttrainingtransformllmscausal,
+      title={Can Post-Training Transform LLMs into Causal Reasoners?},
+      author={Junqi Chen and Sirui Chen and Chaochao Lu},
+      year={2026},
+      eprint={2602.06337},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2602.06337},
+}
+```