QwerkyAI
/

QRe-Llama-3-8B-Instruct-QDistill

Text Generation

qwerky_llama_mamba_hybrid

Model card Files Files and versions

Qwerky-DB commited on Feb 26

Commit

35c8c54

·

verified ·

1 Parent(s): 13a30fe

Update README.md

Files changed (1) hide show

README.md +11 -5

README.md CHANGED Viewed

@@ -12,9 +12,9 @@ library_name: transformers
 pipeline_tag: text-generation
 ---
-# QwerkyLlamaMambaHybrid
-Hybrid Mamba-Transformer model from Qwerky AI.
 ## Requirements
@@ -30,15 +30,15 @@ pip install transformers torch safetensors
 pip install flash-attn mamba-ssm causal-conv1d --no-build-isolation
 ```
-## Usage
 ```python
 import torch
 from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("QwerkyAI/Qwick-8B-Instruct")
 model = AutoModelForCausalLM.from_pretrained(
-    "QwerkyAI/Qwick-8B-Instruct",
     torch_dtype=torch.bfloat16,
     device_map="auto",
     trust_remote_code=True
@@ -49,6 +49,12 @@ outputs = model.generate(**inputs, max_new_tokens=50)
 print(tokenizer.decode(outputs[0]))
 ```
 ## Model Files
 - `config.json` - Model configuration with `auto_map`

 pipeline_tag: text-generation
 ---
+# QRe Llama 3 8B Instruct - QDistill
+This is a hybrid Mamba-Transformer model based on the Llama 3.1 architecture, distilled from Llama 3.3 70B into a 8B parameter model using Qwerky's proprietary distillation method. The model uses MAMBA layers interleaved with attention layers for efficient sequence modeling. The results are a 8B parameter model comparable in quality to Llama's 3.1 8B but running at speeds as fast or faster than Llama's 3.2 3B model.
 ## Requirements
 pip install flash-attn mamba-ssm causal-conv1d --no-build-isolation
 ```
+## Usage - Transformers
 ```python
 import torch
 from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("QwerkyAI/QRe-Llama-3-8B-Instruct-QDistill")
 model = AutoModelForCausalLM.from_pretrained(
+    "QwerkyAI/QRe-Llama-3-8B-Instruct-QDistill",
     torch_dtype=torch.bfloat16,
     device_map="auto",
     trust_remote_code=True
 print(tokenizer.decode(outputs[0]))
 ```
+## Usage - vLLM
+```bash
+pip install vllm qwerky-vllm-models
+vllm serve QwerkyAI/QRe-Llama-3-8B-Instruct-QDistill
+```
 ## Model Files
 - `config.json` - Model configuration with `auto_map`