CauGym / README.md
nielsr's picture
nielsr HF Staff
Add model card and metadata for CauGym-GRPO-14B
72573e8 verified
|
raw
history blame
1.85 kB
metadata
license: apache-2.0
library_name: transformers
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
pipeline_tag: text-generation
tags:
  - causal-inference
  - causal-reasoning
  - reinforcement-learning
  - grpo

Can Post-Training Transform LLMs into Causal Reasoners?

This repository contains the CauGym-GRPO-14B model, a causal inference agent developed through targeted post-training of a 14B parameter LLM.

Model Description

The model was fine-tuned using Group Relative Policy Optimization (GRPO) on the CauGym dataset, which covers seven core causal inference tasks across interventional and counterfactual domains. The research demonstrates that targeted post-training enables smaller models to perform competitively with or even surpass much larger counterparts on complex causal tasks.

Key Features

  • Backbone: DeepSeek-R1-Distill-Qwen-14B.
  • High Performance: Achieves 93.5% accuracy on the CaLM benchmark, compared to 55.4% by OpenAI o3.
  • Robustness: Exhibits strong generalization under real-world conditions, such as distribution shifts and noisy data.
  • Internalization: Capable of independently recognizing and applying causal theorems like the Backdoor Criterion.

Citation

If you use the CauGym dataset or reference this research, please cite:

@misc{chen2026posttrainingtransformllmscausal,
      title={Can Post-Training Transform LLMs into Causal Reasoners?}, 
      author={Junqi Chen and Sirui Chen and Chaochao Lu},
      year={2026},
      eprint={2602.06337},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.06337}, 
}