File size: 1,849 Bytes
72573e8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
---
license: apache-2.0
library_name: transformers
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
pipeline_tag: text-generation
tags:
- causal-inference
- causal-reasoning
- reinforcement-learning
- grpo
---
# Can Post-Training Transform LLMs into Causal Reasoners?
This repository contains the **CauGym-GRPO-14B** model, a causal inference agent developed through targeted post-training of a 14B parameter LLM.
- **Paper:** [Can Post-Training Transform LLMs into Causal Reasoners?](https://huggingface.co/papers/2602.06337)
- **Repository:** [GitHub - OpenCausaLab/CauGym](https://github.com/OpenCausaLab/CauGym)
## Model Description
The model was fine-tuned using **Group Relative Policy Optimization (GRPO)** on the CauGym dataset, which covers seven core causal inference tasks across interventional and counterfactual domains. The research demonstrates that targeted post-training enables smaller models to perform competitively with or even surpass much larger counterparts on complex causal tasks.
### Key Features
- **Backbone:** DeepSeek-R1-Distill-Qwen-14B.
- **High Performance:** Achieves 93.5% accuracy on the CaLM benchmark, compared to 55.4% by OpenAI o3.
- **Robustness:** Exhibits strong generalization under real-world conditions, such as distribution shifts and noisy data.
- **Internalization:** Capable of independently recognizing and applying causal theorems like the Backdoor Criterion.
## Citation
If you use the CauGym dataset or reference this research, please cite:
```bibtex
@misc{chen2026posttrainingtransformllmscausal,
title={Can Post-Training Transform LLMs into Causal Reasoners?},
author={Junqi Chen and Sirui Chen and Chaochao Lu},
year={2026},
eprint={2602.06337},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.06337},
}
``` |