Staticaliza
/

ReyaChat

Text Generation

compressed-tensors

Model card Files Files and versions

Staticaliza commited on Jun 23, 2025

Commit

00f0a50

·

verified ·

1 Parent(s): 7337aa5

Update README.md

Files changed (1) hide show

README.md +38 -2

README.md CHANGED Viewed

@@ -2,6 +2,42 @@
 license: apache-2.0
 pipeline_tag: text-generation
 ---
-# x: y
-:)

 license: apache-2.0
 pipeline_tag: text-generation
 ---
+# Reya Human: Information
+<img src="https://huggingface.co/Statical-Workspace/Storage/resolve/main/DepressedGirl.png" alt="icon" height="500" width="500">
+This is a low restriction, creative roleplay and conversational model based on "phi-4".
+It can chat like a human (only when configured correctly)!
+I have distilled and quantized the model through GPTQ 4-bit model (W4A16), meaning it can run on most GPUs.
+Established by Staticaliza.
+# vLLM: Use Instruction
+```python
+from huggingface_hub import snapshot_download
+from vllm import LLM, SamplingParams
+# Consider toggling "enforce_eager" to False if you want to load the model quicker, at the expense of tokens per second.
+repo = snapshot_download(repo_id="Staticaliza/Reya-Human", allow_patterns=["*.json", "*.bin", "*.safetensors"])
+llm = LLM(model=repo, dtype="auto", tensor_parallel_size=torch.cuda.device_count(), enforce_eager=True, trust_remote_code=True)
+params = SamplingParams(
+    max_tokens=256,
+    temperature=1,
+    top_p=0.35,
+    top_k=50,
+    min_p=0.05,
+    presence_penalty=0,
+    frequency_penalty=0,
+    repetition_penalty=1,
+    stop=["<|im_end|>"],
+    seed=42,
+)
+result = llm.generate(input, params)[0].outputs[0].text
+print(result)
+```