Staticaliza commited on
Commit
e64055b
·
verified ·
1 Parent(s): d925ff5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -2
README.md CHANGED
@@ -3,16 +3,36 @@ license: apache-2.0
3
  pipeline_tag: text-generation
4
  ---
5
 
6
- # Statica-Reasoning-8B-Distilled-GPTQ-Int4
7
 
8
  <img src="https://huggingface.co/Statical-Workspace/Storage/resolve/main/DepressedGirl.png" alt="icon" height="500" width="500">
9
 
10
- A low restricted reasoning creative roleplay and conversation model based on [ArliAI/QwQ-32B-ArliAI-RpR-v4](https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v4) and [huihui-ai/DeepSeek-R1-0528-Qwen3-8B-abliterated](https://huggingface.co/huihui-ai/DeepSeek-R1-0528-Qwen3-8B-abliterated).
11
 
12
  This is a distilled and quantized 4 bit GPTQ model (W4A16) that can run on GPU.
13
 
14
  # vLLM Use
15
 
16
  ```python
 
 
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ```
 
3
  pipeline_tag: text-generation
4
  ---
5
 
6
+ # Reasoning Reya: The LLM
7
 
8
  <img src="https://huggingface.co/Statical-Workspace/Storage/resolve/main/DepressedGirl.png" alt="icon" height="500" width="500">
9
 
10
+ An low restriction reasoning creative roleplay and conversation model based on [ArliAI/QwQ-32B-ArliAI-RpR-v4](https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v4) and [huihui-ai/DeepSeek-R1-0528-Qwen3-8B-abliterated](https://huggingface.co/huihui-ai/DeepSeek-R1-0528-Qwen3-8B-abliterated).
11
 
12
  This is a distilled and quantized 4 bit GPTQ model (W4A16) that can run on GPU.
13
 
14
  # vLLM Use
15
 
16
  ```python
17
+ from huggingface_hub import snapshot_download
18
+ from vllm import LLM, SamplingParams
19
 
20
+ # Consider toggling "enforce_eager" to False if you want to load the model quicker, at the expense of tokens per second.
21
+ llm = LLM(model=snapshot_download(repo_id="Staticaliza/Reya-Reasoning-8B-Distilled-GPTQ-Int4", allow_patterns=["*.json", "*.bin", "*.safetensors"]), dtype="auto", tensor_parallel_size=torch.cuda.device_count(), enforce_eager=True, trust_remote_code=True)
22
+
23
+ params = SamplingParams(
24
+ max_tokens=256,
25
+ temperature=1,
26
+ top_p=0.95,
27
+ top_k=50,
28
+ min_p=0.1,
29
+ presence_penalty=0,
30
+ frequency_penalty=0,
31
+ repetition_penalty=1,
32
+ stop=["<|im_end|>"],
33
+ seed=42,
34
+ )
35
+
36
+ result = model.generate(input, params)[0].outputs[0].text
37
+ print(result)
38
  ```