--- license: apache-2.0 base_model: - deepseek-ai/DeepSeek-R1-Distill-Qwen-14B pipeline_tag: question-answering metrics: - accuracy --- # OpenCausaLab/CauGym CauGym model is a model trained via GRPO (Group Relative Policy Optimization) on VERL framework (https://github.com/verl-project/verl), and it is specialized for causal inference. ## Model Details - **Developed by:** OpenCausaLab - **Model type:** LLM. - **Language(s) (NLP):** Englsih. ### Model Sources - **Repository:** https://github.com/OpenCausaLab/CauGym - **Paper :** https://www.arxiv.org/abs/2602.06337 ### Evaluation We have evaluated this model on CALM benchmark and CauGym benchmark, and the evaluation metric is accuracy. | Benchmark | ATE | CDE | ETT | NDE | NIE | PN | PS | | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | **CALM** | 0.990 | 0.994 | 0.900 | 0.940 | 0.930 | 0.928 | 0.866 | | **CauGym-rephrased**| 0.948 | 0.982 | 0.856 | 0.890 | 0.888 | 0.778 | 0.816 | | **CauGym-ommitted** | 0.935 | 0.963 | 0.837 | 0.934 | 0.838 | 0.900 | 0.907 | | **CauGym-deconfounding** | 0.976 | 0.986 | 0.854 | 0.572 | 0.872 | 0.952 | 0.848 | | **CauGym-redundant** | 0.972 | 0.966 | 0.918 | 0.850 | 0.888 | 0.934 | 0.910 | | **CauGym-insufficient** | 0.884 | 0.902 | 0.686 | 0.696 | 0.958 | 0.940 | 0.954 | ## Citation ```latex @misc{chen2026posttrainingtransformllmscausal, title={Can Post-Training Transform LLMs into Causal Reasoners?}, author={Junqi Chen and Sirui Chen and Chaochao Lu}, year={2026}, eprint={2602.06337}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2602.06337}, } ```