OpenCausaLab
/

CauGym

Question Answering

Model card Files Files and versions

CauGym / README.md

nielsr's picture

nielsr HF Staff

Add model card and metadata for CauGym-GRPO-14B

72573e8 verified 7 days ago

|

1.85 kB

	---
	license: apache-2.0
	library_name: transformers
	base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
	pipeline_tag: text-generation
	tags:
	- causal-inference
	- causal-reasoning
	- reinforcement-learning
	- grpo
	---

	# Can Post-Training Transform LLMs into Causal Reasoners?

	This repository contains the CauGym-GRPO-14B model, a causal inference agent developed through targeted post-training of a 14B parameter LLM.

	- Paper: [Can Post-Training Transform LLMs into Causal Reasoners?](https://huggingface.co/papers/2602.06337)
	- Repository: [GitHub - OpenCausaLab/CauGym](https://github.com/OpenCausaLab/CauGym)

	## Model Description
	The model was fine-tuned using Group Relative Policy Optimization (GRPO) on the CauGym dataset, which covers seven core causal inference tasks across interventional and counterfactual domains. The research demonstrates that targeted post-training enables smaller models to perform competitively with or even surpass much larger counterparts on complex causal tasks.

	### Key Features
	- Backbone: DeepSeek-R1-Distill-Qwen-14B.
	- High Performance: Achieves 93.5% accuracy on the CaLM benchmark, compared to 55.4% by OpenAI o3.
	- Robustness: Exhibits strong generalization under real-world conditions, such as distribution shifts and noisy data.
	- Internalization: Capable of independently recognizing and applying causal theorems like the Backdoor Criterion.

	## Citation
	If you use the CauGym dataset or reference this research, please cite:

	```bibtex
	@misc{chen2026posttrainingtransformllmscausal,
	title={Can Post-Training Transform LLMs into Causal Reasoners?},
	author={Junqi Chen and Sirui Chen and Chaochao Lu},
	year={2026},
	eprint={2602.06337},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2602.06337},
	}
	```