TurkishCodeMan
/

Qwen2.5-3B-Instruct-Q4_K_M-GGUF

+---
+license: apache-2.0
+tags:
+  - gguf
+  - llama.cpp
+  - ios
+  - mobile
+  - qwen2.5
+  - quantized
+base_model: Qwen/Qwen2.5-3B-Instruct
+model_type: qwen2
+language:
+  - en
+  - tr
+---
+# Qwen2.5-3B-Instruct-Q4_K_M-GGUF
+GGUF quantized version of Qwen2.5-3B-Instruct for mobile and edge deployment.
+## Model Details
+- **Base Model:** [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
+- **Quantization:** Q4_K_M (4-bit quantization with K-quants)
+- **File Size:** ~1.8 GB
+- **Format:** GGUF
+## Usage
+### With llama.cpp
+```bash
+./llama-cli -m Qwen2.5-3B-Instruct-Q4_K_M.gguf -p "Hello, how are you?"
+```
+### With llama.swiftui (iOS)
+This model is optimized for running on iOS devices using the llama.swiftui app.
+1. Download the model
+2. Copy to app's Documents folder
+3. Load and chat!
+### Chat Template
+```
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{user_message}<|im_end|>
+<|im_start|>assistant
+```
+## Performance
+| Device | Tokens/sec |
+|--------|------------|
+| iPhone 15 Pro | ~15-25 t/s |
+| iPhone 14 | ~10-15 t/s |
+| M1 Mac | ~30-50 t/s |
+## License
+Apache 2.0 (following the base model license)
+## Credits
+- Original model by [Qwen Team](https://huggingface.co/Qwen)
+- Quantization and mobile optimization by TurkishCodeMan