BaochangRen commited on
Commit
5a699bc
·
verified ·
1 Parent(s): 2a5856b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -3
README.md CHANGED
@@ -1,3 +1,87 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ <div align="center">
6
+ <h1 align="center"> KnowRL </h1>
7
+ <h3 align="center"> Exploring Knowledgeable Reinforcement Learning for Factuality </h3>
8
+
9
+ <p align="center">
10
+   <a href="https://arxiv.org/abs/2506.19807">📄arXiv</a> •
11
+   <a href="https://github.com/zjunlp/KnowRL">💻GitHub Repo</a> •
12
+   <a href="https://huggingface.co/datasets/zjunlp/KnowRL-Train-Data">📖Dataset</a>
13
+ </p>
14
+ </div>
15
+
16
+ ---
17
+
18
+ ## Model Description
19
+
20
+ **KnowRL-DeepSeek-R1-Distill-Qwen-7B** is a slow-thinking language model that results from applying our **KnowRL** framework to the base model `DeepSeek-R1-Distill-Qwen-7B`.
21
+
22
+ The **KnowRL (Knowledgeable Reinforcement Learning)** framework is designed to mitigate hallucinations in Large Language Models (LLMs) by integrating external knowledge directly into the training process. The model is trained using **Knowledgeable Reinforcement Learning (RL)**, where a reward signal explicitly encourages factual accuracy in its reasoning process, helping it learn its own knowledge boundaries.
23
+
24
+ As a result, this model demonstrates a significant reduction in hallucinations on factual benchmarks while preserving or even enhancing the strong reasoning capabilities inherited from its base model.
25
+
26
+ ## How to Use
27
+
28
+ ### Using the `transformers` Library
29
+
30
+ You can use this model with the `transformers` library for text generation tasks. It is important to follow the specific prompt format, which includes `<think>` and `<answer>` tags, to get the best results.
31
+
32
+ ```python
33
+ import torch
34
+ from transformers import AutoModelForCausalLM, AutoTokenizer
35
+
36
+ # Set the device
37
+ device = "cuda" if torch.cuda.is_available() else "cpu"
38
+
39
+ # Load the model and tokenizer
40
+ model_name = "zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B"
41
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
42
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device)
43
+
44
+ # Define the prompt using the model's template
45
+ prompt = "What is the main function of the mitochondria?"
46
+ messages = [
47
+ {"role": "user", "content": prompt}
48
+ ]
49
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
50
+
51
+ # Generate a response
52
+ inputs = tokenizer(text, return_tensors="pt").to(device)
53
+ outputs = model.generate(**inputs, max_new_tokens=512)
54
+
55
+ # Decode and print the output
56
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
57
+ print(response)
58
+ ```
59
+ ### Using `huggingface-cli`
60
+ You can also download the model from the command line using `huggingface-cli`.
61
+
62
+ ```bash
63
+ huggingface-cli download zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B --local-dir KnowRL-DeepSeek-R1-Distill-Qwen-7B
64
+ ```
65
+
66
+ ## Training Details
67
+
68
+ The model is trained using Knowledgeable Reinforcement Learning (RL) (specifically GRPO) using data from the `zjunlp/KnowRL-Train-Data`.
69
+
70
+ For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL
71
+ ).
72
+
73
+ ---
74
+
75
+ ## Citation
76
+ If you find this model useful in your research, please consider citing our paper:
77
+ ```bibtex
78
+ @article{ren2025knowrl,
79
+ title={KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality},
80
+ author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
81
+ journal={arXiv preprint arXiv:2506.19807},
82
+ year={2025}
83
+ }
84
+ ```
85
+
86
+
87
+