LLM360
/

K2-V2-Instruct

Model card Files Files and versions

richardmfan commited on Dec 5, 2025

Commit

69e0aa5

·

verified ·

1 Parent(s): 707e362

Update README.md

Files changed (1) hide show

README.md +31 -2

README.md CHANGED Viewed

@@ -31,8 +31,8 @@ Beyond standard competencies such as factual knowledge and conversational abilit
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-model = AutoModelForCausalLM.from_pretrained("llm360/k2-v2", device_map="auto")
-tokenizer = AutoTokenizer.from_pretrained("llm360/k2-v2")
 prompt = "Explain why the derivative of sin(x) is cos(x)."
 messages = [
@@ -43,12 +43,41 @@ text = tokenizer.apply_chat_template(
     messages,
     tokenize=False,
     add_generation_prompt=True
 )
 inputs = tokenizer(text, return_tensors="pt").to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=200)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
 ---
 ## **Evaluation Summary**

 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("llm360/k2-v2-instruct", device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained("llm360/k2-v2-instruct")
 prompt = "Explain why the derivative of sin(x) is cos(x)."
 messages = [
     messages,
     tokenize=False,
     add_generation_prompt=True
+    reasoning_effort="high" # Or "medium"/"low"
 )
 inputs = tokenizer(text, return_tensors="pt").to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=200)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
+Alternatively, you may serve the model using VLLM:
+```
+vllm serve LLM360/K2-V2-Instruct --tensor-parallel-size 8 --port 8000 --revision "sft_final" &
+```
+K2-V2-Instruct uses `reasoning_effort="low"|"medium"|"high"` in the chat template to determine reasoning effort. If you cannot use `tokenizer.apply_chat_template`, you may also pass in these arguments using `extra_body` and `chat_template_kwargs`:
+```
+curl -X POST "http://localhost:8000/v1/chat/completions" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer key" \
+  -d $'{
+  "model": "LLM360/K2-V2-Instruct",
+  "messages": [
+    {
+      "role": "user",
+      "content": "Explain why the derivative of sin(x) is cos(x)."
+    }
+  ],
+  "extra_body": {
+    "chat_template_kwargs": {
+      "reasoning_effort": "high"
+    }
+  }
+}'
+```
 ---
 ## **Evaluation Summary**