Update model card with installation and evaluation instructions

Browse files

This PR adds installation and evaluation instructions from the Github README to the model card for better usability.

Files changed (1) hide show

README.md +42 -8

README.md CHANGED Viewed

@@ -1,11 +1,11 @@
 ---
-license: apache-2.0
-datasets:
-- agentica-org/DeepScaleR-Preview-Dataset
 base_model:
 - Vinnnf/Thinkless-1.5B-Warmup
-pipeline_tag: text-generation
 library_name: transformers
 ---
 # Thinkless: LLM Learns When to Think
@@ -73,7 +73,8 @@ prompt = "The arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$
 # prompt = "How many r's are in the word \"strawberry\""
 messages = [
-    {"role": "user", "content": f"{instruction}\n{prompt}"},
 ]
 text = tokenizer.apply_chat_template(
@@ -103,10 +104,44 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 think_mode = ("<think>" in response)
 print(text+response)
-print(f"\nThink Mode: {think_mode}")
 print(f"Number of tokens: {num_tokens}")
 ```
 ## Case Study
 **User:**
@@ -198,7 +233,6 @@ Checking the next perfect cubes (64, 125, etc.) confirms they do not yield integ
 \]
 ```
 ## Citation
 If you find this work helpful, please cite:
 ```
@@ -208,4 +242,4 @@ If you find this work helpful, please cite:
   journal={arXiv preprint arXiv:2505.13379},
   year={2025}
 }
-```

 ---
 base_model:
 - Vinnnf/Thinkless-1.5B-Warmup
+datasets:
+- agentica-org/DeepScaleR-Preview-Dataset
 library_name: transformers
+license: apache-2.0
+pipeline_tag: text-generation
 ---
 # Thinkless: LLM Learns When to Think
 # prompt = "How many r's are in the word \"strawberry\""
 messages = [
+    {"role": "user", "content": f"{instruction}
+{prompt}"},
 ]
 text = tokenizer.apply_chat_template(
 think_mode = ("<think>" in response)
 print(text+response)
+print(f"
+Think Mode: {think_mode}")
 print(f"Number of tokens: {num_tokens}")
 ```
+## Installation
+```bash
+conda create -n thinkless python==3.10
+conda activate thinkless
+# For training
+cd Thinkless
+pip install torch==2.4.0 lm_eval==0.4.8 ray==2.45.0 # install lm_eval before verl to avoid conflict
+pip install -e ./verl
+pip install -e .
+# https://github.com/vllm-project/vllm/issues/4392
+pip install nvidia-cublas-cu12==12.4.5.8
+```
+## Evaluate the pre-trained model (Optional)
+#### LM-Eval
+This script will repeat the generation for 5 times using lm_eval. All results will be saved in `./eval_results`.
+```bash
+bash run_eval.sh
+```
+#### Extract answers for evaluation
+We only use LM-Eval for generation but do not use the built-in answer extractor. Instead, we developed an [evaluation tool](scripts/eval) based on the prompts in [openai/simple-evals](https://github.com/openai/simple-evals). To obtain the final metrics, please run the following command:
+```bash
+bash scripts/eval/eval_all.sh YOUR_MODEL_PATH THE_EVAL_RESULTS_PATH
+```
+For example, to evaluate the results under *eval_results/Vinnnf__Thinkless-1.5B-RL-DeepScaleR*, run the following command:
+```bash
+bash scripts/eval/eval_all.sh Vinnnf/Thinkless-1.5B-RL-DeepScaleR eval_results/Vinnnf__Thinkless-1.5B-RL-DeepScaleR
+```
 ## Case Study
 **User:**
 \]
 ```
 ## Citation
 If you find this work helpful, please cite:
 ```
   journal={arXiv preprint arXiv:2505.13379},
   year={2025}
 }
+```