zjunlp
/

KnowRL-DeepSeek-R1-Distill-Qwen-7B

Model card Files Files and versions

KnowRL-DeepSeek-R1-Distill-Qwen-7B / README.md

nielsr's picture

nielsr HF Staff

Add library_name and pipeline_tag metadata

dc01d9f verified about 2 months ago

|

3.45 kB

	---
	license: mit
	library_name: transformers
	pipeline_tag: text-generation
	base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
	datasets:
	- zjunlp/KnowRL-Train-Data
	---

	<div align="center">
	<h1 align="center"> KnowRL </h1>
	<h3 align="center"> Exploring Knowledgeable Reinforcement Learning for Factuality </h3>

	<p align="center">
	<a href="https://arxiv.org/abs/2506.19807">📄arXiv</a> •
	<a href="https://github.com/zjunlp/KnowRL">💻GitHub Repo</a> •
	<a href="https://huggingface.co/datasets/zjunlp/KnowRL-Train-Data">📖Dataset</a>
	</p>
	</div>

	---

	## Model Description

	KnowRL-DeepSeek-R1-Distill-Qwen-7B is a slow-thinking language model that results from applying our KnowRL framework to the base model `DeepSeek-R1-Distill-Qwen-7B`.

	The KnowRL (Knowledgeable Reinforcement Learning) framework is designed to mitigate hallucinations in Large Language Models (LLMs) by integrating external knowledge directly into the training process. The model is trained using Knowledgeable Reinforcement Learning (RL), where a reward signal explicitly encourages factual accuracy in its reasoning process, helping it learn its own knowledge boundaries.

	As a result, this model demonstrates a significant reduction in hallucinations on factual benchmarks while preserving or even enhancing the strong reasoning capabilities inherited from its base model.

	## How to Use

	### Using the `transformers` Library

	You can use this model with the `transformers` library for text generation tasks. It is important to follow the specific prompt format, which includes `<think>` and `<answer>` tags, to get the best results.

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Set the device
	device = "cuda" if torch.cuda.is_available() else "cpu"

	# Load the model and tokenizer
	model_name = "zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device)

	# Define the prompt using the model's template
	prompt = "What is the main function of the mitochondria?"
	messages = [
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

	# Generate a response
	inputs = tokenizer(text, return_tensors="pt").to(device)
	outputs = model.generate(**inputs, max_new_tokens=512)

	# Decode and print the output
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```
	### Using `huggingface-cli`
	You can also download the model from the command line using `huggingface-cli`.

	```bash
	huggingface-cli download zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B --local-dir KnowRL-DeepSeek-R1-Distill-Qwen-7B
	```

	## Training Details

	The model is trained using Knowledgeable Reinforcement Learning (RL) (specifically GRPO) using data from the `zjunlp/KnowRL-Train-Data`.

	For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL).

	---

	## Citation
	If you find this model useful in your research, please consider citing our paper:
	```bibtex
	@article{ren2025knowrl,
	title={KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality},
	author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
	journal={arXiv preprint arXiv:2506.19807},
	year={2025}
	}
	```