Staticaliza
/

ReyaChat-Reasoning

Text Generation

compressed-tensors

Model card Files Files and versions

Staticaliza commited on Jun 23, 2025

Commit

e64055b

·

verified ·

1 Parent(s): d925ff5

Update README.md

Files changed (1) hide show

README.md +22 -2

README.md CHANGED Viewed

@@ -3,16 +3,36 @@ license: apache-2.0
 pipeline_tag: text-generation
 ---
-# Statica-Reasoning-8B-Distilled-GPTQ-Int4
 <img src="https://huggingface.co/Statical-Workspace/Storage/resolve/main/DepressedGirl.png" alt="icon" height="500" width="500">
-A low restricted reasoning creative roleplay and conversation model based on [ArliAI/QwQ-32B-ArliAI-RpR-v4](https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v4) and [huihui-ai/DeepSeek-R1-0528-Qwen3-8B-abliterated](https://huggingface.co/huihui-ai/DeepSeek-R1-0528-Qwen3-8B-abliterated).
 This is a distilled and quantized 4 bit GPTQ model (W4A16) that can run on GPU.
 # vLLM Use
 ```python
 ```

 pipeline_tag: text-generation
 ---
+# Reasoning Reya: The LLM
 <img src="https://huggingface.co/Statical-Workspace/Storage/resolve/main/DepressedGirl.png" alt="icon" height="500" width="500">
+An low restriction reasoning creative roleplay and conversation model based on [ArliAI/QwQ-32B-ArliAI-RpR-v4](https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v4) and [huihui-ai/DeepSeek-R1-0528-Qwen3-8B-abliterated](https://huggingface.co/huihui-ai/DeepSeek-R1-0528-Qwen3-8B-abliterated).
 This is a distilled and quantized 4 bit GPTQ model (W4A16) that can run on GPU.
 # vLLM Use
 ```python
+from huggingface_hub import snapshot_download
+from vllm import LLM, SamplingParams
+# Consider toggling "enforce_eager" to False if you want to load the model quicker, at the expense of tokens per second.
+llm = LLM(model=snapshot_download(repo_id="Staticaliza/Reya-Reasoning-8B-Distilled-GPTQ-Int4", allow_patterns=["*.json", "*.bin", "*.safetensors"]), dtype="auto", tensor_parallel_size=torch.cuda.device_count(), enforce_eager=True, trust_remote_code=True)
+params = SamplingParams(
+    max_tokens=256,
+    temperature=1,
+    top_p=0.95,
+    top_k=50,
+    min_p=0.1,
+    presence_penalty=0,
+    frequency_penalty=0,
+    repetition_penalty=1,
+    stop=["<|im_end|>"],
+    seed=42,
+)
+result = model.generate(input, params)[0].outputs[0].text
+print(result)
 ```