Qwerky-DB commited on
Commit
35c8c54
·
verified ·
1 Parent(s): 13a30fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -5
README.md CHANGED
@@ -12,9 +12,9 @@ library_name: transformers
12
  pipeline_tag: text-generation
13
  ---
14
 
15
- # QwerkyLlamaMambaHybrid
16
 
17
- Hybrid Mamba-Transformer model from Qwerky AI.
18
 
19
  ## Requirements
20
 
@@ -30,15 +30,15 @@ pip install transformers torch safetensors
30
  pip install flash-attn mamba-ssm causal-conv1d --no-build-isolation
31
  ```
32
 
33
- ## Usage
34
 
35
  ```python
36
  import torch
37
  from transformers import AutoTokenizer, AutoModelForCausalLM
38
 
39
- tokenizer = AutoTokenizer.from_pretrained("QwerkyAI/Qwick-8B-Instruct")
40
  model = AutoModelForCausalLM.from_pretrained(
41
- "QwerkyAI/Qwick-8B-Instruct",
42
  torch_dtype=torch.bfloat16,
43
  device_map="auto",
44
  trust_remote_code=True
@@ -49,6 +49,12 @@ outputs = model.generate(**inputs, max_new_tokens=50)
49
  print(tokenizer.decode(outputs[0]))
50
  ```
51
 
 
 
 
 
 
 
52
  ## Model Files
53
 
54
  - `config.json` - Model configuration with `auto_map`
 
12
  pipeline_tag: text-generation
13
  ---
14
 
15
+ # QRe Llama 3 8B Instruct - QDistill
16
 
17
+ This is a hybrid Mamba-Transformer model based on the Llama 3.1 architecture, distilled from Llama 3.3 70B into a 8B parameter model using Qwerky's proprietary distillation method. The model uses MAMBA layers interleaved with attention layers for efficient sequence modeling. The results are a 8B parameter model comparable in quality to Llama's 3.1 8B but running at speeds as fast or faster than Llama's 3.2 3B model.
18
 
19
  ## Requirements
20
 
 
30
  pip install flash-attn mamba-ssm causal-conv1d --no-build-isolation
31
  ```
32
 
33
+ ## Usage - Transformers
34
 
35
  ```python
36
  import torch
37
  from transformers import AutoTokenizer, AutoModelForCausalLM
38
 
39
+ tokenizer = AutoTokenizer.from_pretrained("QwerkyAI/QRe-Llama-3-8B-Instruct-QDistill")
40
  model = AutoModelForCausalLM.from_pretrained(
41
+ "QwerkyAI/QRe-Llama-3-8B-Instruct-QDistill",
42
  torch_dtype=torch.bfloat16,
43
  device_map="auto",
44
  trust_remote_code=True
 
49
  print(tokenizer.decode(outputs[0]))
50
  ```
51
 
52
+ ## Usage - vLLM
53
+ ```bash
54
+ pip install vllm qwerky-vllm-models
55
+ vllm serve QwerkyAI/QRe-Llama-3-8B-Instruct-QDistill
56
+ ```
57
+
58
  ## Model Files
59
 
60
  - `config.json` - Model configuration with `auto_map`