nielsr HF Staff commited on
Commit
376a8a9
·
verified ·
1 Parent(s): 0a5d291

Update model card with installation and evaluation instructions

Browse files

This PR adds installation and evaluation instructions from the Github README to the model card for better usability.

Files changed (1) hide show
  1. README.md +42 -8
README.md CHANGED
@@ -1,11 +1,11 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - agentica-org/DeepScaleR-Preview-Dataset
5
  base_model:
6
  - Vinnnf/Thinkless-1.5B-Warmup
7
- pipeline_tag: text-generation
 
8
  library_name: transformers
 
 
9
  ---
10
 
11
  # Thinkless: LLM Learns When to Think
@@ -73,7 +73,8 @@ prompt = "The arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$
73
  # prompt = "How many r's are in the word \"strawberry\""
74
 
75
  messages = [
76
- {"role": "user", "content": f"{instruction}\n{prompt}"},
 
77
  ]
78
 
79
  text = tokenizer.apply_chat_template(
@@ -103,10 +104,44 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
103
  think_mode = ("<think>" in response)
104
 
105
  print(text+response)
106
- print(f"\nThink Mode: {think_mode}")
 
107
  print(f"Number of tokens: {num_tokens}")
108
  ```
109
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
  ## Case Study
111
 
112
  **User:**
@@ -198,7 +233,6 @@ Checking the next perfect cubes (64, 125, etc.) confirms they do not yield integ
198
  \]
199
  ```
200
 
201
-
202
  ## Citation
203
  If you find this work helpful, please cite:
204
  ```
@@ -208,4 +242,4 @@ If you find this work helpful, please cite:
208
  journal={arXiv preprint arXiv:2505.13379},
209
  year={2025}
210
  }
211
- ```
 
1
  ---
 
 
 
2
  base_model:
3
  - Vinnnf/Thinkless-1.5B-Warmup
4
+ datasets:
5
+ - agentica-org/DeepScaleR-Preview-Dataset
6
  library_name: transformers
7
+ license: apache-2.0
8
+ pipeline_tag: text-generation
9
  ---
10
 
11
  # Thinkless: LLM Learns When to Think
 
73
  # prompt = "How many r's are in the word \"strawberry\""
74
 
75
  messages = [
76
+ {"role": "user", "content": f"{instruction}
77
+ {prompt}"},
78
  ]
79
 
80
  text = tokenizer.apply_chat_template(
 
104
  think_mode = ("<think>" in response)
105
 
106
  print(text+response)
107
+ print(f"
108
+ Think Mode: {think_mode}")
109
  print(f"Number of tokens: {num_tokens}")
110
  ```
111
 
112
+ ## Installation
113
+
114
+ ```bash
115
+ conda create -n thinkless python==3.10
116
+ conda activate thinkless
117
+
118
+ # For training
119
+ cd Thinkless
120
+ pip install torch==2.4.0 lm_eval==0.4.8 ray==2.45.0 # install lm_eval before verl to avoid conflict
121
+ pip install -e ./verl
122
+ pip install -e .
123
+ # https://github.com/vllm-project/vllm/issues/4392
124
+ pip install nvidia-cublas-cu12==12.4.5.8
125
+ ```
126
+
127
+ ## Evaluate the pre-trained model (Optional)
128
+
129
+ #### LM-Eval
130
+ This script will repeat the generation for 5 times using lm_eval. All results will be saved in `./eval_results`.
131
+ ```bash
132
+ bash run_eval.sh
133
+ ```
134
+
135
+ #### Extract answers for evaluation
136
+ We only use LM-Eval for generation but do not use the built-in answer extractor. Instead, we developed an [evaluation tool](scripts/eval) based on the prompts in [openai/simple-evals](https://github.com/openai/simple-evals). To obtain the final metrics, please run the following command:
137
+ ```bash
138
+ bash scripts/eval/eval_all.sh YOUR_MODEL_PATH THE_EVAL_RESULTS_PATH
139
+ ```
140
+ For example, to evaluate the results under *eval_results/Vinnnf__Thinkless-1.5B-RL-DeepScaleR*, run the following command:
141
+ ```bash
142
+ bash scripts/eval/eval_all.sh Vinnnf/Thinkless-1.5B-RL-DeepScaleR eval_results/Vinnnf__Thinkless-1.5B-RL-DeepScaleR
143
+ ```
144
+
145
  ## Case Study
146
 
147
  **User:**
 
233
  \]
234
  ```
235
 
 
236
  ## Citation
237
  If you find this work helpful, please cite:
238
  ```
 
242
  journal={arXiv preprint arXiv:2505.13379},
243
  year={2025}
244
  }
245
+ ```