zjunlp
/

KnowRL-DeepSeek-R1-Distill-Qwen-7B

Safetensors

qwen2

Model card Files Files and versions

xet

Community

BaochangRen commited on Nov 29, 2025

Commit

5a699bc

verified ·

1 Parent(s): 2a5856b

Update README.md

Browse files

Files changed (1) hide show

README.md +87 -3

README.md CHANGED Viewed

@@ -1,3 +1,87 @@
----
-license: mit
----

+---
+license: mit
+---
+<div align="center">
+<h1 align="center"> KnowRL </h1>
+<h3 align="center"> Exploring Knowledgeable Reinforcement Learning for Factuality </h3>
+<p align="center">
+  <a href="https://arxiv.org/abs/2506.19807">📄arXiv</a> •
+  <a href="https://github.com/zjunlp/KnowRL">💻GitHub Repo</a> •
+  <a href="https://huggingface.co/datasets/zjunlp/KnowRL-Train-Data">📖Dataset</a>
+</p>
+</div>
+---
+## Model Description
+**KnowRL-DeepSeek-R1-Distill-Qwen-7B** is a slow-thinking language model that results from applying our **KnowRL** framework to the base model `DeepSeek-R1-Distill-Qwen-7B`.
+The **KnowRL (Knowledgeable Reinforcement Learning)** framework is designed to mitigate hallucinations in Large Language Models (LLMs) by integrating external knowledge directly into the training process. The model is trained using **Knowledgeable Reinforcement Learning (RL)**, where a reward signal explicitly encourages factual accuracy in its reasoning process, helping it learn its own knowledge boundaries.
+As a result, this model demonstrates a significant reduction in hallucinations on factual benchmarks while preserving or even enhancing the strong reasoning capabilities inherited from its base model.
+## How to Use
+### Using the `transformers` Library
+You can use this model with the `transformers` library for text generation tasks. It is important to follow the specific prompt format, which includes `<think>` and `<answer>` tags, to get the best results.
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Set the device
+device = "cuda" if torch.cuda.is_available() else "cpu"
+# Load the model and tokenizer
+model_name = "zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device)
+# Define the prompt using the model's template
+prompt = "What is the main function of the mitochondria?"
+messages = [
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+# Generate a response
+inputs = tokenizer(text, return_tensors="pt").to(device)
+outputs = model.generate(**inputs, max_new_tokens=512)
+# Decode and print the output
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+### Using `huggingface-cli`
+You can also download the model from the command line using `huggingface-cli`.
+```bash
+huggingface-cli download zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B --local-dir KnowRL-DeepSeek-R1-Distill-Qwen-7B
+```
+## Training Details
+The model is trained using Knowledgeable Reinforcement Learning (RL) (specifically GRPO) using data from the `zjunlp/KnowRL-Train-Data`.
+For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL
+).
+---
+## Citation
+If you find this model useful in your research, please consider citing our paper:
+```bibtex
+@article{ren2025knowrl,
+  title={KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality},
+  author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
+  journal={arXiv preprint arXiv:2506.19807},
+  year={2025}
+}
+```