Update model card with installation and evaluation instructions
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,11 +1,11 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
datasets:
|
| 4 |
-
- agentica-org/DeepScaleR-Preview-Dataset
|
| 5 |
base_model:
|
| 6 |
- Vinnnf/Thinkless-1.5B-Warmup
|
| 7 |
-
|
|
|
|
| 8 |
library_name: transformers
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
# Thinkless: LLM Learns When to Think
|
|
@@ -73,7 +73,8 @@ prompt = "The arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$
|
|
| 73 |
# prompt = "How many r's are in the word \"strawberry\""
|
| 74 |
|
| 75 |
messages = [
|
| 76 |
-
{"role": "user", "content": f"{instruction}
|
|
|
|
| 77 |
]
|
| 78 |
|
| 79 |
text = tokenizer.apply_chat_template(
|
|
@@ -103,10 +104,44 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
| 103 |
think_mode = ("<think>" in response)
|
| 104 |
|
| 105 |
print(text+response)
|
| 106 |
-
print(f"
|
|
|
|
| 107 |
print(f"Number of tokens: {num_tokens}")
|
| 108 |
```
|
| 109 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
## Case Study
|
| 111 |
|
| 112 |
**User:**
|
|
@@ -198,7 +233,6 @@ Checking the next perfect cubes (64, 125, etc.) confirms they do not yield integ
|
|
| 198 |
\]
|
| 199 |
```
|
| 200 |
|
| 201 |
-
|
| 202 |
## Citation
|
| 203 |
If you find this work helpful, please cite:
|
| 204 |
```
|
|
@@ -208,4 +242,4 @@ If you find this work helpful, please cite:
|
|
| 208 |
journal={arXiv preprint arXiv:2505.13379},
|
| 209 |
year={2025}
|
| 210 |
}
|
| 211 |
-
```
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Vinnnf/Thinkless-1.5B-Warmup
|
| 4 |
+
datasets:
|
| 5 |
+
- agentica-org/DeepScaleR-Preview-Dataset
|
| 6 |
library_name: transformers
|
| 7 |
+
license: apache-2.0
|
| 8 |
+
pipeline_tag: text-generation
|
| 9 |
---
|
| 10 |
|
| 11 |
# Thinkless: LLM Learns When to Think
|
|
|
|
| 73 |
# prompt = "How many r's are in the word \"strawberry\""
|
| 74 |
|
| 75 |
messages = [
|
| 76 |
+
{"role": "user", "content": f"{instruction}
|
| 77 |
+
{prompt}"},
|
| 78 |
]
|
| 79 |
|
| 80 |
text = tokenizer.apply_chat_template(
|
|
|
|
| 104 |
think_mode = ("<think>" in response)
|
| 105 |
|
| 106 |
print(text+response)
|
| 107 |
+
print(f"
|
| 108 |
+
Think Mode: {think_mode}")
|
| 109 |
print(f"Number of tokens: {num_tokens}")
|
| 110 |
```
|
| 111 |
|
| 112 |
+
## Installation
|
| 113 |
+
|
| 114 |
+
```bash
|
| 115 |
+
conda create -n thinkless python==3.10
|
| 116 |
+
conda activate thinkless
|
| 117 |
+
|
| 118 |
+
# For training
|
| 119 |
+
cd Thinkless
|
| 120 |
+
pip install torch==2.4.0 lm_eval==0.4.8 ray==2.45.0 # install lm_eval before verl to avoid conflict
|
| 121 |
+
pip install -e ./verl
|
| 122 |
+
pip install -e .
|
| 123 |
+
# https://github.com/vllm-project/vllm/issues/4392
|
| 124 |
+
pip install nvidia-cublas-cu12==12.4.5.8
|
| 125 |
+
```
|
| 126 |
+
|
| 127 |
+
## Evaluate the pre-trained model (Optional)
|
| 128 |
+
|
| 129 |
+
#### LM-Eval
|
| 130 |
+
This script will repeat the generation for 5 times using lm_eval. All results will be saved in `./eval_results`.
|
| 131 |
+
```bash
|
| 132 |
+
bash run_eval.sh
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
#### Extract answers for evaluation
|
| 136 |
+
We only use LM-Eval for generation but do not use the built-in answer extractor. Instead, we developed an [evaluation tool](scripts/eval) based on the prompts in [openai/simple-evals](https://github.com/openai/simple-evals). To obtain the final metrics, please run the following command:
|
| 137 |
+
```bash
|
| 138 |
+
bash scripts/eval/eval_all.sh YOUR_MODEL_PATH THE_EVAL_RESULTS_PATH
|
| 139 |
+
```
|
| 140 |
+
For example, to evaluate the results under *eval_results/Vinnnf__Thinkless-1.5B-RL-DeepScaleR*, run the following command:
|
| 141 |
+
```bash
|
| 142 |
+
bash scripts/eval/eval_all.sh Vinnnf/Thinkless-1.5B-RL-DeepScaleR eval_results/Vinnnf__Thinkless-1.5B-RL-DeepScaleR
|
| 143 |
+
```
|
| 144 |
+
|
| 145 |
## Case Study
|
| 146 |
|
| 147 |
**User:**
|
|
|
|
| 233 |
\]
|
| 234 |
```
|
| 235 |
|
|
|
|
| 236 |
## Citation
|
| 237 |
If you find this work helpful, please cite:
|
| 238 |
```
|
|
|
|
| 242 |
journal={arXiv preprint arXiv:2505.13379},
|
| 243 |
year={2025}
|
| 244 |
}
|
| 245 |
+
```
|