Improve model card and add missing information
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -3,21 +3,33 @@ library_name: transformers
|
|
| 3 |
tags:
|
| 4 |
- generated_from_trainer
|
| 5 |
- open-r1
|
| 6 |
-
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
-
# Model Card for
|
| 10 |
|
| 11 |
-
This
|
| 12 |
-
It has been trained using [TRL](https://github.com/huggingface/trl).
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
```python
|
| 17 |
from transformers import pipeline
|
| 18 |
|
| 19 |
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
|
| 20 |
-
generator = pipeline("text-generation", model="
|
| 21 |
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
|
| 22 |
print(output["generated_text"])
|
| 23 |
```
|
|
@@ -26,8 +38,7 @@ print(output["generated_text"])
|
|
| 26 |
|
| 27 |
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/sishuzheng/huggingface/runs/339l70ik)
|
| 28 |
|
| 29 |
-
|
| 30 |
-
This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
|
| 31 |
|
| 32 |
### Framework versions
|
| 33 |
|
|
@@ -39,7 +50,14 @@ This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing
|
|
| 39 |
|
| 40 |
## Citations
|
| 41 |
|
| 42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
```bibtex
|
| 45 |
@article{zhihong2024deepseekmath,
|
|
@@ -48,11 +66,8 @@ Cite GRPO as:
|
|
| 48 |
year = 2024,
|
| 49 |
eprint = {arXiv:2402.03300},
|
| 50 |
}
|
| 51 |
-
|
| 52 |
```
|
| 53 |
|
| 54 |
-
Cite TRL as:
|
| 55 |
-
|
| 56 |
```bibtex
|
| 57 |
@misc{vonwerra2022trl,
|
| 58 |
title = {{TRL: Transformer Reinforcement Learning}},
|
|
@@ -62,4 +77,6 @@ Cite TRL as:
|
|
| 62 |
publisher = {GitHub},
|
| 63 |
howpublished = {\url{https://github.com/huggingface/trl}}
|
| 64 |
}
|
| 65 |
-
```
|
|
|
|
|
|
|
|
|
| 3 |
tags:
|
| 4 |
- generated_from_trainer
|
| 5 |
- open-r1
|
| 6 |
+
license: cc-by-4.0
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# Model Card for CANOE Models
|
| 11 |
|
| 12 |
+
This repository contains several fine-tuned LLMs trained using the CANOE framework, as described in [Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning](https://huggingface.co/papers/2505.16483). CANOE improves the contextual faithfulness of LLMs in both short-form and long-form generation without requiring human annotations. It synthesizes short-form question-answering data and employs a rule-based reinforcement learning method (Dual-GRPO) to optimize response generation.
|
|
|
|
| 13 |
|
| 14 |
+
|
| 15 |
+
## Available Models
|
| 16 |
+
|
| 17 |
+
Here is a list of the available CANOE models:
|
| 18 |
+
|
| 19 |
+
| Model | Hugging Face Checkpoint | Base Model | Description |
|
| 20 |
+
|----------------------|-----------------------------------------|-------------------------|--------------------------------------------------------------------------------- |
|
| 21 |
+
| **CANOE-LLaMA3-8B** | [🤗 Link](https://huggingface.co/ssz1111/CANOE-LLaMA3-8B) | `meta-llama/Llama-3-8b-instruct` | Chat model, based on LLaMA3-Instruct-8B. |
|
| 22 |
+
| **CANOE-Qwen2.5-7B** | [🤗 Link](https://huggingface.co/ssz1111/CANOE-Qwen2.5-7B) | `Qwen/Qwen-2.5-Instruct-7B`| Chat model, based on Qwen2.5-Instruct-7B. |
|
| 23 |
+
| **CANOE-Qwen2.5-14B** | [🤗 Link](https://huggingface.co/ssz1111/CANOE-Qwen2.5-14B)| `Qwen/Qwen-2.5-Instruct-14B`| Chat model, based on Qwen2.5-Instruct-14B. |
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
## Quick Start
|
| 27 |
|
| 28 |
```python
|
| 29 |
from transformers import pipeline
|
| 30 |
|
| 31 |
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
|
| 32 |
+
generator = pipeline("text-generation", model="ssz1111/CANOE-LLaMA3-8B", device="cuda") #Use a specific model here
|
| 33 |
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
|
| 34 |
print(output["generated_text"])
|
| 35 |
```
|
|
|
|
| 38 |
|
| 39 |
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/sishuzheng/huggingface/runs/339l70ik)
|
| 40 |
|
| 41 |
+
This model was trained using the CANOE framework, which synthesizes short-form question-answering data and uses a rule-based reinforcement learning method (Dual-GRPO). The training data and evaluation datasets are detailed in the Github README.
|
|
|
|
| 42 |
|
| 43 |
### Framework versions
|
| 44 |
|
|
|
|
| 50 |
|
| 51 |
## Citations
|
| 52 |
|
| 53 |
+
```bibtex
|
| 54 |
+
@article{si2025teaching,
|
| 55 |
+
title={Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning},
|
| 56 |
+
author={Si, Shuzheng and Zhao, Haozhe and Gao, Cheng and Bai, Yuzhuo and Wang, Zhitong and Gao, Bofei and Luo, Kangyang and Li, Wenhao and Huang, Yufei and Chen, Gang and others},
|
| 57 |
+
journal={arXiv preprint arXiv:2505.16483},
|
| 58 |
+
year={2025}
|
| 59 |
+
}
|
| 60 |
+
```
|
| 61 |
|
| 62 |
```bibtex
|
| 63 |
@article{zhihong2024deepseekmath,
|
|
|
|
| 66 |
year = 2024,
|
| 67 |
eprint = {arXiv:2402.03300},
|
| 68 |
}
|
|
|
|
| 69 |
```
|
| 70 |
|
|
|
|
|
|
|
| 71 |
```bibtex
|
| 72 |
@misc{vonwerra2022trl,
|
| 73 |
title = {{TRL: Transformer Reinforcement Learning}},
|
|
|
|
| 77 |
publisher = {GitHub},
|
| 78 |
howpublished = {\url{https://github.com/huggingface/trl}}
|
| 79 |
}
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
Github repository: [CANOE Github](https://github.com/huggingface/CANOE)
|