Staticaliza commited on
Commit
02addc8
·
verified ·
1 Parent(s): e64055b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -6
README.md CHANGED
@@ -3,22 +3,25 @@ license: apache-2.0
3
  pipeline_tag: text-generation
4
  ---
5
 
6
- # Reasoning Reya: The LLM
7
 
8
  <img src="https://huggingface.co/Statical-Workspace/Storage/resolve/main/DepressedGirl.png" alt="icon" height="500" width="500">
9
 
10
- An low restriction reasoning creative roleplay and conversation model based on [ArliAI/QwQ-32B-ArliAI-RpR-v4](https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v4) and [huihui-ai/DeepSeek-R1-0528-Qwen3-8B-abliterated](https://huggingface.co/huihui-ai/DeepSeek-R1-0528-Qwen3-8B-abliterated).
11
 
12
- This is a distilled and quantized 4 bit GPTQ model (W4A16) that can run on GPU.
13
 
14
- # vLLM Use
 
 
15
 
16
  ```python
17
  from huggingface_hub import snapshot_download
18
  from vllm import LLM, SamplingParams
19
 
20
  # Consider toggling "enforce_eager" to False if you want to load the model quicker, at the expense of tokens per second.
21
- llm = LLM(model=snapshot_download(repo_id="Staticaliza/Reya-Reasoning-8B-Distilled-GPTQ-Int4", allow_patterns=["*.json", "*.bin", "*.safetensors"]), dtype="auto", tensor_parallel_size=torch.cuda.device_count(), enforce_eager=True, trust_remote_code=True)
 
22
 
23
  params = SamplingParams(
24
  max_tokens=256,
@@ -33,6 +36,6 @@ params = SamplingParams(
33
  seed=42,
34
  )
35
 
36
- result = model.generate(input, params)[0].outputs[0].text
37
  print(result)
38
  ```
 
3
  pipeline_tag: text-generation
4
  ---
5
 
6
+ # Reasoning Reya: Information
7
 
8
  <img src="https://huggingface.co/Statical-Workspace/Storage/resolve/main/DepressedGirl.png" alt="icon" height="500" width="500">
9
 
10
+ This is a low restriction, creative roleplay and conversational reasoning model based on [ArliAI/QwQ-32B-ArliAI-RpR-v4](https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v4) and [huihui-ai/DeepSeek-R1-0528-Qwen3-8B-abliterated](https://huggingface.co/huihui-ai/DeepSeek-R1-0528-Qwen3-8B-abliterated).
11
 
12
+ I have distilled and quantized the model through GPTQ 4-bit model (W4A16), meaning it can run on most GPUs.
13
 
14
+ Established by Staticaliza.
15
+
16
+ # vLLM: Use Instruction
17
 
18
  ```python
19
  from huggingface_hub import snapshot_download
20
  from vllm import LLM, SamplingParams
21
 
22
  # Consider toggling "enforce_eager" to False if you want to load the model quicker, at the expense of tokens per second.
23
+ repo = snapshot_download(repo_id="Staticaliza/Reya-Reasoning-8B-Distilled-GPTQ-Int4", allow_patterns=["*.json", "*.bin", "*.safetensors"])
24
+ llm = LLM(model=repo, dtype="auto", tensor_parallel_size=torch.cuda.device_count(), enforce_eager=True, trust_remote_code=True)
25
 
26
  params = SamplingParams(
27
  max_tokens=256,
 
36
  seed=42,
37
  )
38
 
39
+ result = llm.generate(input, params)[0].outputs[0].text
40
  print(result)
41
  ```