---
license: apache-2.0
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
pipeline_tag: question-answering
metrics:
- accuracy
---
# OpenCausaLab/CauGym

<!-- Provide a quick summary of what the model is/does. -->
CauGym model is a model trained via GRPO (Group Relative Policy Optimization) on VERL framework (https://github.com/verl-project/verl), and it is specialized for causal inference.

## Model Details

- **Developed by:** OpenCausaLab
- **Model type:** LLM.
- **Language(s) (NLP):** Englsih.


### Model Sources 

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/OpenCausaLab/CauGym
- **Paper :** https://www.arxiv.org/abs/2602.06337


### Evaluation

We have evaluated this model on CALM benchmark and CauGym benchmark, and the evaluation metric is accuracy.
| Benchmark | ATE | CDE | ETT | NDE | NIE | PN | PS |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| **CALM** | 0.990 | 0.994 | 0.900 | 0.940 | 0.930 | 0.928 | 0.866 |
| **CauGym-rephrased**| 0.948 | 0.982 | 0.856 | 0.890 | 0.888 | 0.778 | 0.816 |
| **CauGym-ommitted** | 0.935 | 0.963 | 0.837 | 0.934 | 0.838 | 0.900 | 0.907 |
| **CauGym-deconfounding** | 0.976 | 0.986 | 0.854 | 0.572 | 0.872 | 0.952 | 0.848 |
| **CauGym-redundant** | 0.972 | 0.966 | 0.918 | 0.850 | 0.888 | 0.934 | 0.910 |
| **CauGym-insufficient** | 0.884 | 0.902 | 0.686 | 0.696 | 0.958 | 0.940 | 0.954 |


## Citation 

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
```latex
@misc{chen2026posttrainingtransformllmscausal,
      title={Can Post-Training Transform LLMs into Causal Reasoners?}, 
      author={Junqi Chen and Sirui Chen and Chaochao Lu},
      year={2026},
      eprint={2602.06337},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.06337}, 
}
```