Safetensors
English
qwen2
sliuau commited on
Commit
2abe2e9
·
verified ·
1 Parent(s): 039ada8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -53
README.md CHANGED
@@ -13,48 +13,6 @@ DLER-Qwen-R1-7B is an ultra-efficient 7B open-weight reasoning model designed fo
13
 
14
  This model is for research and development only.
15
 
16
- ### Deployment Geography:
17
- Global <br>
18
-
19
- ### Use Case: <br>
20
- Researchers and developers can use this model to solve math, coding, and STEM questions.
21
-
22
- ### Release Date: <br>
23
- Hugging Face 9/10/2025 via https://huggingface.co/nvidia/DLER-R1-7B <br>
24
-
25
-
26
- ## Model Architecture:
27
- **Architecture Type:** Dense decoder-only Transformer model <br>
28
-
29
- **Network Architecture:** [DeepSeek-R1-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) <br>
30
-
31
- **This model was developed based on DeepSeek-R1-7B <br>
32
-
33
- ## Software Integration:
34
- **Runtime Engine(s):** Transformers
35
-
36
- **Supported Hardware Microarchitecture Compatibility:** <br>
37
- * NVIDIA Ampere <br>
38
- * NVIDIA Hopper <br>
39
-
40
-
41
- **Preferred/Supported Operating System(s):**
42
- * Linux <br>
43
-
44
- The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
45
-
46
- ## Model Version(s):
47
- 1.0
48
-
49
-
50
- ### Training Dataset:
51
-
52
- | Dataset | Link |
53
- |---------------------------|-------------------------------------------------------------------------------------------|
54
- | DeepScaleR-Preview-Dataset | [Link](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset) |
55
-
56
- **Properties:** 479K question and answer pairs <br>
57
-
58
  ### Evaluation Results:
59
  **Benchmark Score <br>
60
 
@@ -63,7 +21,11 @@ The integration of foundation and fine-tuned models into AI systems requires add
63
  | Deepseek-R1-7B | 93.60 | 3999 | 55.40 | 13241 | 82.90 | 7461 | 49.79 | 5199 | 58.21 | 8837 | 7747 |
64
  | **DLER-R1-7B** | **94.21 (+0.61%)** | **1634 (-60%)** | **55.62 (+0.22%)** | **3230 (-76%)** | **84.41 (+1.51%)** | **2512 (-0.67%)** | **53.88 (+4.09%)** | **2058 (-61%)** | **60.48 (+2.27%)** | **2592 (-71%)** | **2405 (-69%)** |
65
 
 
66
 
 
 
 
67
  # Inference:
68
  ```python
69
  from transformers import AutoTokenizer, AutoModelForCausalLM
@@ -96,26 +58,27 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
96
  ```
97
 
98
  ### License/Terms of Use
99
- TBD
100
- https://docs.google.com/spreadsheets/d/15AiIBHLsm-HY1RZH5nkaA0siE-5grHke9uFYaiD_28E/edit?gid=1088371820#gid=1088371820
101
 
102
  ## Ethical Considerations:
103
  NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
104
 
 
105
  Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
106
 
107
 
108
  ## Citation
109
- If you find our dataset helpful, please cite the following [paper]():
110
-
111
-
112
-
113
-
114
- ```
115
-
116
-
117
-
118
 
119
  ```
 
 
 
 
 
 
 
 
 
120
 
121
 
 
13
 
14
  This model is for research and development only.
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ### Evaluation Results:
17
  **Benchmark Score <br>
18
 
 
21
  | Deepseek-R1-7B | 93.60 | 3999 | 55.40 | 13241 | 82.90 | 7461 | 49.79 | 5199 | 58.21 | 8837 | 7747 |
22
  | **DLER-R1-7B** | **94.21 (+0.61%)** | **1634 (-60%)** | **55.62 (+0.22%)** | **3230 (-76%)** | **84.41 (+1.51%)** | **2512 (-0.67%)** | **53.88 (+4.09%)** | **2058 (-61%)** | **60.48 (+2.27%)** | **2592 (-71%)** | **2405 (-69%)** |
23
 
24
+ ### Environment Setup
25
 
26
+ ```
27
+ pip install transformers==4.51.3
28
+ ```
29
  # Inference:
30
  ```python
31
  from transformers import AutoTokenizer, AutoModelForCausalLM
 
58
  ```
59
 
60
  ### License/Terms of Use
61
+ NSCLv1
 
62
 
63
  ## Ethical Considerations:
64
  NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
65
 
66
+
67
  Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
68
 
69
 
70
  ## Citation
71
+ If you find our model helpful, please cite the following [paper]():
 
 
 
 
 
 
 
 
72
 
73
  ```
74
+ @misc{liu2025dlerdoinglengthpenalty,
75
+ title={DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning},
76
+ author={Shih-Yang Liu and Xin Dong and Ximing Lu and Shizhe Diao and Mingjie Liu and Min-Hung Chen and Hongxu Yin and Yu-Chiang Frank Wang and Kwang-Ting Cheng and Yejin Choi and Jan Kautz and Pavlo Molchanov},
77
+ year={2025},
78
+ eprint={2510.15110},
79
+ archivePrefix={arXiv},
80
+ primaryClass={cs.LG},
81
+ url={https://arxiv.org/abs/2510.15110},
82
+ }
83
 
84