| --- |
| license: mit |
| library_name: transformers |
| pipeline_tag: text-generation |
| base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
| datasets: |
| - zjunlp/KnowRL-Train-Data |
| --- |
| |
| <div align="center"> |
| <h1 align="center"> KnowRL </h1> |
| <h3 align="center"> Exploring Knowledgeable Reinforcement Learning for Factuality </h3> |
|
|
| <p align="center"> |
| <a href="https://arxiv.org/abs/2506.19807">📄arXiv</a> • |
| <a href="https://github.com/zjunlp/KnowRL">💻GitHub Repo</a> • |
| <a href="https://huggingface.co/datasets/zjunlp/KnowRL-Train-Data">📖Dataset</a> |
| </p> |
| </div> |
|
|
| --- |
|
|
| ## Model Description |
|
|
| **KnowRL-DeepSeek-R1-Distill-Qwen-7B** is a slow-thinking language model that results from applying our **KnowRL** framework to the base model `DeepSeek-R1-Distill-Qwen-7B`. |
|
|
| The **KnowRL (Knowledgeable Reinforcement Learning)** framework is designed to mitigate hallucinations in Large Language Models (LLMs) by integrating external knowledge directly into the training process. The model is trained using **Knowledgeable Reinforcement Learning (RL)**, where a reward signal explicitly encourages factual accuracy in its reasoning process, helping it learn its own knowledge boundaries. |
|
|
| As a result, this model demonstrates a significant reduction in hallucinations on factual benchmarks while preserving or even enhancing the strong reasoning capabilities inherited from its base model. |
|
|
| ## How to Use |
|
|
| ### Using the `transformers` Library |
|
|
| You can use this model with the `transformers` library for text generation tasks. It is important to follow the specific prompt format, which includes `<think>` and `<answer>` tags, to get the best results. |
|
|
| ```python |
| import torch |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| # Set the device |
| device = "cuda" if torch.cuda.is_available() else "cpu" |
| |
| # Load the model and tokenizer |
| model_name = "zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B" |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device) |
| |
| # Define the prompt using the model's template |
| prompt = "What is the main function of the mitochondria?" |
| messages = [ |
| {"role": "user", "content": prompt} |
| ] |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| |
| # Generate a response |
| inputs = tokenizer(text, return_tensors="pt").to(device) |
| outputs = model.generate(**inputs, max_new_tokens=512) |
| |
| # Decode and print the output |
| response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
| print(response) |
| ``` |
| ### Using `huggingface-cli` |
| You can also download the model from the command line using `huggingface-cli`. |
|
|
| ```bash |
| huggingface-cli download zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B --local-dir KnowRL-DeepSeek-R1-Distill-Qwen-7B |
| ``` |
|
|
| ## Training Details |
|
|
| The model is trained using Knowledgeable Reinforcement Learning (RL) (specifically GRPO) using data from the `zjunlp/KnowRL-Train-Data`. |
|
|
| For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL). |
|
|
| --- |
|
|
| ## Citation |
| If you find this model useful in your research, please consider citing our paper: |
| ```bibtex |
| @article{ren2025knowrl, |
| title={KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality}, |
| author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu}, |
| journal={arXiv preprint arXiv:2506.19807}, |
| year={2025} |
| } |
| ``` |