Update model card with installation and evaluation instructions

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +42 -8
README.md CHANGED
@@ -1,11 +1,11 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - agentica-org/DeepScaleR-Preview-Dataset
5
  base_model:
6
  - Vinnnf/Thinkless-1.5B-Warmup
7
- pipeline_tag: text-generation
 
8
  library_name: transformers
 
 
9
  ---
10
 
11
  # Thinkless: LLM Learns When to Think
@@ -73,7 +73,8 @@ prompt = "The arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$
73
  # prompt = "How many r's are in the word \"strawberry\""
74
 
75
  messages = [
76
- {"role": "user", "content": f"{instruction}\n{prompt}"},
 
77
  ]
78
 
79
  text = tokenizer.apply_chat_template(
@@ -103,10 +104,44 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
103
  think_mode = ("<think>" in response)
104
 
105
  print(text+response)
106
- print(f"\nThink Mode: {think_mode}")
 
107
  print(f"Number of tokens: {num_tokens}")
108
  ```
109
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
  ## Case Study
111
 
112
  **User:**
@@ -198,7 +233,6 @@ Checking the next perfect cubes (64, 125, etc.) confirms they do not yield integ
198
  \]
199
  ```
200
 
201
-
202
  ## Citation
203
  If you find this work helpful, please cite:
204
  ```
@@ -208,4 +242,4 @@ If you find this work helpful, please cite:
208
  journal={arXiv preprint arXiv:2505.13379},
209
  year={2025}
210
  }
211
- ```
 
1
  ---
 
 
 
2
  base_model:
3
  - Vinnnf/Thinkless-1.5B-Warmup
4
+ datasets:
5
+ - agentica-org/DeepScaleR-Preview-Dataset
6
  library_name: transformers
7
+ license: apache-2.0
8
+ pipeline_tag: text-generation
9
  ---
10
 
11
  # Thinkless: LLM Learns When to Think
 
73
  # prompt = "How many r's are in the word \"strawberry\""
74
 
75
  messages = [
76
+ {"role": "user", "content": f"{instruction}
77
+ {prompt}"},
78
  ]
79
 
80
  text = tokenizer.apply_chat_template(
 
104
  think_mode = ("<think>" in response)
105
 
106
  print(text+response)
107
+ print(f"
108
+ Think Mode: {think_mode}")
109
  print(f"Number of tokens: {num_tokens}")
110
  ```
111
 
112
+ ## Installation
113
+
114
+ ```bash
115
+ conda create -n thinkless python==3.10
116
+ conda activate thinkless
117
+
118
+ # For training
119
+ cd Thinkless
120
+ pip install torch==2.4.0 lm_eval==0.4.8 ray==2.45.0 # install lm_eval before verl to avoid conflict
121
+ pip install -e ./verl
122
+ pip install -e .
123
+ # https://github.com/vllm-project/vllm/issues/4392
124
+ pip install nvidia-cublas-cu12==12.4.5.8
125
+ ```
126
+
127
+ ## Evaluate the pre-trained model (Optional)
128
+
129
+ #### LM-Eval
130
+ This script will repeat the generation for 5 times using lm_eval. All results will be saved in `./eval_results`.
131
+ ```bash
132
+ bash run_eval.sh
133
+ ```
134
+
135
+ #### Extract answers for evaluation
136
+ We only use LM-Eval for generation but do not use the built-in answer extractor. Instead, we developed an [evaluation tool](scripts/eval) based on the prompts in [openai/simple-evals](https://github.com/openai/simple-evals). To obtain the final metrics, please run the following command:
137
+ ```bash
138
+ bash scripts/eval/eval_all.sh YOUR_MODEL_PATH THE_EVAL_RESULTS_PATH
139
+ ```
140
+ For example, to evaluate the results under *eval_results/Vinnnf__Thinkless-1.5B-RL-DeepScaleR*, run the following command:
141
+ ```bash
142
+ bash scripts/eval/eval_all.sh Vinnnf/Thinkless-1.5B-RL-DeepScaleR eval_results/Vinnnf__Thinkless-1.5B-RL-DeepScaleR
143
+ ```
144
+
145
  ## Case Study
146
 
147
  **User:**
 
233
  \]
234
  ```
235
 
 
236
  ## Citation
237
  If you find this work helpful, please cite:
238
  ```
 
242
  journal={arXiv preprint arXiv:2505.13379},
243
  year={2025}
244
  }
245
+ ```