|
|
--- |
|
|
license: apache-2.0 |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# ReyaChat |
|
|
|
|
|
<img src="https://huggingface.co/Statical-Workspace/Storage/resolve/main/DepressedGirl.png" alt="icon" height="500" width="500"> |
|
|
|
|
|
This is a low restriction, creative roleplay and conversational model based on [SicariusSicariiStuff/Phi-lthy4](https://huggingface.co/SicariusSicariiStuff/Phi-lthy4) a finetune of [microsoft/phi-4](https://huggingface.co/microsoft/phi-4), along with [SicariusSicariiStuff/Impish_Magic_24B](https://huggingface.co/SicariusSicariiStuff/Impish_Magic_24B). |
|
|
|
|
|
It can chat like a human (only when configured correctly)! |
|
|
|
|
|
I have distilled and quantized the model through GPTQ 4-bit model (W4A16), meaning it can run on most GPUs. |
|
|
|
|
|
Note I made this model for personal use, I just have the repository public for everyone else to use, not expecting me answering requests. |
|
|
|
|
|
Established by Staticaliza. |
|
|
|
|
|
# vLLM: Use Instruction |
|
|
|
|
|
```python |
|
|
from huggingface_hub import snapshot_download |
|
|
from vllm import LLM, SamplingParams |
|
|
|
|
|
# Consider toggling "enforce_eager" to False if you want to load the model quicker, at the expense of tokens per second. |
|
|
repo = snapshot_download(repo_id="Staticaliza/Reya-Human", allow_patterns=["*.json", "*.bin", "*.safetensors"]) |
|
|
llm = LLM(model=repo, dtype="auto", tensor_parallel_size=torch.cuda.device_count(), enforce_eager=True, trust_remote_code=True) |
|
|
|
|
|
# ChatML is suggested |
|
|
input = """<|im_start|>system |
|
|
You are Reya.<|im_end|> |
|
|
<|im_start|>user |
|
|
Hi.<|im_end|> |
|
|
<|im_start|>assistant |
|
|
""" |
|
|
|
|
|
params = SamplingParams( |
|
|
max_tokens=256, |
|
|
temperature=1, |
|
|
top_p=0.35, |
|
|
top_k=50, |
|
|
min_p=0.05, |
|
|
presence_penalty=0, |
|
|
frequency_penalty=0, |
|
|
repetition_penalty=1, |
|
|
stop=["<|im_end|>"], |
|
|
seed=42, |
|
|
) |
|
|
|
|
|
result = llm.generate(input, params)[0].outputs[0].text |
|
|
print(result) |
|
|
``` |