Staticaliza commited on
Commit
00f0a50
·
verified ·
1 Parent(s): 7337aa5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -2
README.md CHANGED
@@ -2,6 +2,42 @@
2
  license: apache-2.0
3
  pipeline_tag: text-generation
4
  ---
5
- # x: y
6
 
7
- :)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  pipeline_tag: text-generation
4
  ---
 
5
 
6
+ # Reya Human: Information
7
+
8
+ <img src="https://huggingface.co/Statical-Workspace/Storage/resolve/main/DepressedGirl.png" alt="icon" height="500" width="500">
9
+
10
+ This is a low restriction, creative roleplay and conversational model based on "phi-4".
11
+
12
+ It can chat like a human (only when configured correctly)!
13
+
14
+ I have distilled and quantized the model through GPTQ 4-bit model (W4A16), meaning it can run on most GPUs.
15
+
16
+ Established by Staticaliza.
17
+
18
+ # vLLM: Use Instruction
19
+
20
+ ```python
21
+ from huggingface_hub import snapshot_download
22
+ from vllm import LLM, SamplingParams
23
+
24
+ # Consider toggling "enforce_eager" to False if you want to load the model quicker, at the expense of tokens per second.
25
+ repo = snapshot_download(repo_id="Staticaliza/Reya-Human", allow_patterns=["*.json", "*.bin", "*.safetensors"])
26
+ llm = LLM(model=repo, dtype="auto", tensor_parallel_size=torch.cuda.device_count(), enforce_eager=True, trust_remote_code=True)
27
+
28
+ params = SamplingParams(
29
+ max_tokens=256,
30
+ temperature=1,
31
+ top_p=0.35,
32
+ top_k=50,
33
+ min_p=0.05,
34
+ presence_penalty=0,
35
+ frequency_penalty=0,
36
+ repetition_penalty=1,
37
+ stop=["<|im_end|>"],
38
+ seed=42,
39
+ )
40
+
41
+ result = llm.generate(input, params)[0].outputs[0].text
42
+ print(result)
43
+ ```