Update README with GGUF format documentation and usage instructions
Browse files
README.md
CHANGED
|
@@ -102,6 +102,85 @@ response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_spec
|
|
| 102 |
- **LoRA Adapter**: Smaller adapter files (`adapter_model.safetensors`, `adapter_config.json`)
|
| 103 |
- **Tokenizer**: Shared tokenizer files for both options
|
| 104 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
## Intended Use
|
| 106 |
|
| 107 |
This model is designed to:
|
|
|
|
| 102 |
- **LoRA Adapter**: Smaller adapter files (`adapter_model.safetensors`, `adapter_config.json`)
|
| 103 |
- **Tokenizer**: Shared tokenizer files for both options
|
| 104 |
|
| 105 |
+
## GGUF Format Models
|
| 106 |
+
|
| 107 |
+
This repository also includes GGUF format models optimized for use with **llama.cpp**, **Ollama**, and other GGUF-compatible inference engines. These formats offer excellent performance and compatibility across different platforms.
|
| 108 |
+
|
| 109 |
+
### Available GGUF Models
|
| 110 |
+
|
| 111 |
+
| File | Size | Format | Use Case | RAM Required |
|
| 112 |
+
|------|------|--------|----------|--------------|
|
| 113 |
+
| `merged-sci-model.gguf` | 14GB | F16 | Maximum quality inference | ~16GB |
|
| 114 |
+
| `merged-sci-model-q4_k_m.gguf` | 4.1GB | Q4_K_M | Balanced quality/performance | ~6GB |
|
| 115 |
+
|
| 116 |
+
### Usage with Ollama
|
| 117 |
+
|
| 118 |
+
**1. Download and create Modelfile:**
|
| 119 |
+
```bash
|
| 120 |
+
# Download the quantized model (recommended)
|
| 121 |
+
wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q4_k_m.gguf
|
| 122 |
+
|
| 123 |
+
# Create Modelfile
|
| 124 |
+
cat > Modelfile << 'EOF'
|
| 125 |
+
FROM ./merged-sci-model-q4_k_m.gguf
|
| 126 |
+
TEMPLATE """<|im_start|>system
|
| 127 |
+
You are a specialized medical assistant for people with spinal cord injuries. Your responses should always consider the unique needs, challenges, and medical realities of individuals living with SCI.<|im_end|>
|
| 128 |
+
<|im_start|>user
|
| 129 |
+
{{ .Prompt }}<|im_end|>
|
| 130 |
+
<|im_start|>assistant
|
| 131 |
+
"""
|
| 132 |
+
PARAMETER stop "<|im_start|>"
|
| 133 |
+
PARAMETER stop "<|im_end|>"
|
| 134 |
+
PARAMETER temperature 0.7
|
| 135 |
+
PARAMETER top_p 0.9
|
| 136 |
+
EOF
|
| 137 |
+
```
|
| 138 |
+
|
| 139 |
+
**2. Create and run the model:**
|
| 140 |
+
```bash
|
| 141 |
+
ollama create sci-assistant -f Modelfile
|
| 142 |
+
ollama run sci-assistant "What are the signs of autonomic dysreflexia?"
|
| 143 |
+
```
|
| 144 |
+
|
| 145 |
+
### Usage with llama.cpp
|
| 146 |
+
|
| 147 |
+
**1. Install and setup:**
|
| 148 |
+
```bash
|
| 149 |
+
# Clone and build llama.cpp
|
| 150 |
+
git clone https://github.com/ggerganov/llama.cpp
|
| 151 |
+
cd llama.cpp
|
| 152 |
+
make
|
| 153 |
+
|
| 154 |
+
# Download model
|
| 155 |
+
wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q4_k_m.gguf
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
**2. Interactive chat:**
|
| 159 |
+
```bash
|
| 160 |
+
./main -m merged-sci-model-q4_k_m.gguf \
|
| 161 |
+
--temp 0.7 \
|
| 162 |
+
--repeat_penalty 1.1 \
|
| 163 |
+
-c 4096 \
|
| 164 |
+
--interactive \
|
| 165 |
+
--in-prefix "<|im_start|>user\n" \
|
| 166 |
+
--in-suffix "<|im_end|>\n<|im_start|>assistant\n"
|
| 167 |
+
```
|
| 168 |
+
|
| 169 |
+
**3. Single prompt:**
|
| 170 |
+
```bash
|
| 171 |
+
./main -m merged-sci-model-q4_k_m.gguf \
|
| 172 |
+
--temp 0.7 \
|
| 173 |
+
-c 2048 \
|
| 174 |
+
-p "<|im_start|>system\nYou are a specialized medical assistant for people with spinal cord injuries.<|im_end|>\n<|im_start|>user\nWhat exercises are good for someone with paraplegia?<|im_end|>\n<|im_start|>assistant\n"
|
| 175 |
+
```
|
| 176 |
+
|
| 177 |
+
### Performance Comparison
|
| 178 |
+
|
| 179 |
+
- **F16 Model** (`merged-sci-model.gguf`): Maximum quality, larger memory footprint
|
| 180 |
+
- **Q4_K_M Model** (`merged-sci-model-q4_k_m.gguf`): 99%+ quality retention, 3.5x smaller size, recommended for most users
|
| 181 |
+
|
| 182 |
+
Both models use the **ChatML** template format and support up to **32K context length**.
|
| 183 |
+
|
| 184 |
## Intended Use
|
| 185 |
|
| 186 |
This model is designed to:
|