Staticaliza
/

ReyaChat-Reasoning

Text Generation

compressed-tensors

Model card Files Files and versions

Staticaliza commited on Jun 23, 2025

Commit

684a6b4

·

verified ·

1 Parent(s): b023e4e

Update README.md

Files changed (1) hide show

README.md +10 -0

README.md CHANGED Viewed

@@ -11,6 +11,8 @@ This is a low restriction, creative roleplay and conversational reasoning model
 I have distilled and quantized the model through GPTQ 4-bit model (W4A16), meaning it can run on most GPUs.
 Established by Staticaliza.
 # vLLM: Use Instruction
@@ -23,6 +25,14 @@ from vllm import LLM, SamplingParams
 repo = snapshot_download(repo_id="Staticaliza/Reya-Reasoning", allow_patterns=["*.json", "*.bin", "*.safetensors"])
 llm = LLM(model=repo, dtype="auto", tensor_parallel_size=torch.cuda.device_count(), enforce_eager=True, trust_remote_code=True)
 params = SamplingParams(
     max_tokens=256,
     temperature=1,

 I have distilled and quantized the model through GPTQ 4-bit model (W4A16), meaning it can run on most GPUs.
+This model uses ChatML chat template and can think using "<think>" and "</think>" tokens.
 Established by Staticaliza.
 # vLLM: Use Instruction
 repo = snapshot_download(repo_id="Staticaliza/Reya-Reasoning", allow_patterns=["*.json", "*.bin", "*.safetensors"])
 llm = LLM(model=repo, dtype="auto", tensor_parallel_size=torch.cuda.device_count(), enforce_eager=True, trust_remote_code=True)
+# ChatML with think tokens is suggested
+input = """<|im_start|>system
+You are Reya.<|im_end|>
+<|im_start|>user
+Hi.<|im_end|>
+<|im_start|>assistant
+<think>"""
 params = SamplingParams(
     max_tokens=256,
     temperature=1,