Add model card and metadata for CauGym-GRPO-14B

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +42 -3
README.md CHANGED
@@ -1,3 +1,42 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - causal-inference
8
+ - causal-reasoning
9
+ - reinforcement-learning
10
+ - grpo
11
+ ---
12
+
13
+ # Can Post-Training Transform LLMs into Causal Reasoners?
14
+
15
+ This repository contains the **CauGym-GRPO-14B** model, a causal inference agent developed through targeted post-training of a 14B parameter LLM.
16
+
17
+ - **Paper:** [Can Post-Training Transform LLMs into Causal Reasoners?](https://huggingface.co/papers/2602.06337)
18
+ - **Repository:** [GitHub - OpenCausaLab/CauGym](https://github.com/OpenCausaLab/CauGym)
19
+
20
+ ## Model Description
21
+ The model was fine-tuned using **Group Relative Policy Optimization (GRPO)** on the CauGym dataset, which covers seven core causal inference tasks across interventional and counterfactual domains. The research demonstrates that targeted post-training enables smaller models to perform competitively with or even surpass much larger counterparts on complex causal tasks.
22
+
23
+ ### Key Features
24
+ - **Backbone:** DeepSeek-R1-Distill-Qwen-14B.
25
+ - **High Performance:** Achieves 93.5% accuracy on the CaLM benchmark, compared to 55.4% by OpenAI o3.
26
+ - **Robustness:** Exhibits strong generalization under real-world conditions, such as distribution shifts and noisy data.
27
+ - **Internalization:** Capable of independently recognizing and applying causal theorems like the Backdoor Criterion.
28
+
29
+ ## Citation
30
+ If you use the CauGym dataset or reference this research, please cite:
31
+
32
+ ```bibtex
33
+ @misc{chen2026posttrainingtransformllmscausal,
34
+ title={Can Post-Training Transform LLMs into Causal Reasoners?},
35
+ author={Junqi Chen and Sirui Chen and Chaochao Lu},
36
+ year={2026},
37
+ eprint={2602.06337},
38
+ archivePrefix={arXiv},
39
+ primaryClass={cs.CL},
40
+ url={https://arxiv.org/abs/2602.06337},
41
+ }
42
+ ```