Add sunflower GGUF quantized models

Browse files

Files changed (13) hide show

.gitattributes +9 -0
.ipynb_checkpoints/README-checkpoint.md +229 -0
Modelfile +22 -0
README.md +229 -0
sunflower-32B-f16.gguf +3 -0
sunflower-32B-iq1_s.gguf +3 -0
sunflower-32B-iq2_xxs.gguf +3 -0
sunflower-32B-q4_k_m.gguf +3 -0
sunflower-32B-q5_k_m.gguf +3 -0
sunflower-32B-q6_k.gguf +3 -0
sunflower-32B-q8_0.gguf +3 -0
sunflower-32B-tq1_0.gguf +3 -0
sunflower-imatrix.dat +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+sunflower-32B-f16.gguf filter=lfs diff=lfs merge=lfs -text
+sunflower-32B-iq1_s.gguf filter=lfs diff=lfs merge=lfs -text
+sunflower-32B-iq2_xxs.gguf filter=lfs diff=lfs merge=lfs -text
+sunflower-32B-q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text
+sunflower-32B-q5_k_m.gguf filter=lfs diff=lfs merge=lfs -text
+sunflower-32B-q6_k.gguf filter=lfs diff=lfs merge=lfs -text
+sunflower-32B-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
+sunflower-32B-tq1_0.gguf filter=lfs diff=lfs merge=lfs -text
+sunflower-imatrix.dat filter=lfs diff=lfs merge=lfs -text

.ipynb_checkpoints/README-checkpoint.md ADDED Viewed

	@@ -0,0 +1,229 @@

+---
+license: apache-2.0
+base_model: Sunbird/Sunflower-32B
+tags:
+- quantized
+- gguf
+- llama.cpp
+- ollama
+- ugandan-languages
+- translation
+- qwen
+library_name: transformers
+pipeline_tag: text-generation
+language:
+- en
+- lg
+---
+# Sunflower 32B - GGUF
+GGUF quantized versions of the Sunflower model for Ugandan language translation tasks.
+## Model Details
+- **Base Model**: [Sunbird/Sunflower-32B](https://huggingface.co/Sunbird/Sunflower-32B)
+- **Model Size**: 32B parameters
+- **Architecture**: Qwen2.5
+- **Quantization**: K-means quantization with importance matrix
+- **Languages**: English, Luganda, and other Ugandan languages
+## Available Files
+### Recommended Quantizations
+| Filename | Quant type | File Size | Description |
+| -------- | ---------- | --------- | ----------- |
+| sunflower-32B-f16.gguf | F16 | 62GB | Original precision |
+| sunflower-32B-q8_0.gguf | Q8_0 | 33GB | Highest quality quantized |
+| sunflower-32B-q6_k.gguf | Q6_K | 26GB | High quality |
+| sunflower-32B-q5_k_m.gguf | Q5_K_M | 22GB | Balanced quality/size |
+| sunflower-32B-q5_k_s.gguf | Q5_K_S | 19GB | Smaller Q5 variant |
+| sunflower-32B-q4_k_m.gguf | Q4_K_M | 17GB | **Recommended for most users** |
+### Warning: Experimental Quantizations
+The following quantizations achieve extreme compression but may significantly impact translation quality. Use for research and experimentation only.
+| Filename | Quant type | File Size | Compression | Warning |
+| -------- | ---------- | --------- | ----------- | ------- |
+| sunflower-32B-iq2_xxs.gguf | IQ2_XXS | 8.5GB | 85% smaller | May lose translation accuracy |
+| sunflower-32B-tq1_0.gguf | TQ1_0 | 7.2GB | 87% smaller | Experimental ternary quantization |
+| sunflower-32B-iq1_s.gguf | IQ1_S | 6.9GB | 88% smaller | **Extreme compression, quality heavily impacted** |
+**Note**: The experimental quantizations (IQ1_S, IQ2_XXS, TQ1_0) use advanced compression techniques that may not preserve the specialized knowledge for Ugandan language translation. Test thoroughly before production use.
+### Additional Files
+| Filename | Description |
+| -------- | ----------- |
+| sunflower-imatrix.dat | Importance matrix data used for quantization |
+## Usage
+### llama.cpp
+```bash
+# Download model
+huggingface-cli download Sunbird/Sunflower-32B-GGUF sunflower-32B-q4_k_m.gguf --local-dir .
+# Run inference
+./llama-cli -m sunflower-32B-q4_k_m.gguf -p "Translate to Luganda: Hello, how are you today?"
+```
+## Ollama Integration
+Ollama provides an easy way to run your quantized models locally with a simple API interface.
+### Installation and Setup
+```bash
+# Install Ollama (Linux/macOS)
+curl -fsSL https://ollama.ai/install.sh | sh
+# Or download from https://ollama.ai for Windows
+# Start Ollama service (runs in background)
+ollama serve
+```
+### Creating Modelfiles for Different Quantizations
+**Q4_K_M (Recommended) - Modelfile:**
+```bash
+cat > Modelfile.q4 << 'EOF'
+FROM ./gguf_outputs/model-q4_k_m.gguf
+# System prompt for your specific use case
+SYSTEM """You are a linguist and translator specializing in Ugandan languages, made by Sunbird AI."""
+# Chat template (adjust for your base model architecture)
+TEMPLATE """<|im_start|>system
+{{ .System }}<|im_end|>
+<|im_start|>user
+{{ .Prompt }}<|im_end|>
+<|im_start|>assistant
+{{ .Response }}<|im_end|>"""
+# Stop tokens
+PARAMETER stop "<|im_start|>"
+PARAMETER stop "<|im_end|>"
+# Generation parameters
+PARAMETER temperature 0.3
+PARAMETER top_p 0.95
+PARAMETER top_k 40
+PARAMETER repeat_penalty 1.1
+PARAMETER num_ctx 4096
+PARAMETER num_predict 500
+EOF
+```
+**Experimental IQ1_S - Modelfile:**
+```bash
+cat > Modelfile.iq1s << 'EOF'
+FROM ./gguf_outputs/model-iq1_s.gguf
+SYSTEM """You are a translator for Ugandan languages. Note: This is an experimental ultra-compressed model - quality may be limited."""
+# Same template and parameters as above
+TEMPLATE """<|im_start|>system
+{{ .System }}<|im_end|>
+<|im_start|>user
+{{ .Prompt }}<|im_end|>
+<|im_start|>assistant
+{{ .Response }}<|im_end|>"""
+PARAMETER stop "<|im_start|>"
+PARAMETER stop "<|im_end|>"
+PARAMETER temperature 0.3
+PARAMETER top_p 0.95
+PARAMETER num_ctx 2048  # Smaller context for experimental model
+EOF
+```
+### Importing Models to Ollama
+```bash
+# Import Q4_K_M model (recommended)
+ollama create sunflower-32B:q4 -f Modelfile.q4
+# Import experimental IQ1_S model
+ollama create sunflower-32B:iq1s -f Modelfile.iq1s
+# Import other quantizations
+ollama create sunflower-32B:q5 -f Modelfile.q5
+ollama create sunflower-32B:q6 -f Modelfile.q6
+# Verify models are imported
+ollama list
+```
+**Expected output:**
+```
+NAME                    ID              SIZE    MODIFIED
+sunflower-32B:q4        abc123def       17GB   2 minutes ago
+sunflower-32B:iq1s      def456ghi       6.2GB   1 minute ago
+```
+### Using Ollama Models
+**Interactive Chat:**
+```bash
+# Start interactive session with Q4 model
+ollama run sunflower-32B:q4
+# Example conversation:
+# >>> Translate to Luganda: Hello, how are you today?
+# >>> Give a dictionary definition of the Samia term "ovulwaye" in English
+# >>> /bye (to exit)
+# Start with experimental model
+ollama run sunflower-32B:iq1s
+```
+**Single Prompt Inference:**
+```bash
+# Quick translation with Q4 model
+ollama run sunflower-32B:q4 "Translate to Luganda: People in villages rarely accept new technologies."
+# Test experimental model
+ollama run sunflower-32B:iq1s "Translate to Luganda: Good morning"
+# Dictionary definition
+ollama run sunflower-32B:q4 'Give a dictionary definition of the Samia term "ovulwaye" in English'
+```
+### Ollama API Usage
+**Start API Server:**
+```bash
+# Ollama automatically serves API on http://localhost:11434
+# Test API endpoint
+curl http://localhost:11434/api/version
+```
+### Python (llama-cpp-python)
+```python
+from llama_cpp import Llama
+llm = Llama(model_path="sunflower-32B-q4_k_m.gguf")
+result = llm("Translate to Luganda: How are you?")
+print(result['choices'][0]['text'])
+```
+## Performance Notes
+- **Q4_K_M**: Recommended for most use cases
+- **Q5_K_M**: Better quality with moderate size increase
+- **Q6_K**: High quality for production use
+- **Q8_0**: Near-lossless quality
+## Technical Details
+Quantized using llama.cpp with importance matrix calibration for optimal quality preservation.
+## License
+Apache 2.0

Modelfile ADDED Viewed

	@@ -0,0 +1,22 @@

+FROM sunflower-32B-q4_k_m.gguf
+# System message
+SYSTEM """You are a linguist and translator specialising in Ugandan languages, made by Sunbird AI."""
+TEMPLATE """<|im_start|>system
+{{ .System }}<|im_end|>
+<|im_start|>user
+{{ .Prompt }}<|im_end|>
+<|im_start|>assistant
+{{ .Response }}<|im_end|>"""
+# Stop tokens
+PARAMETER stop "<|im_start|>"
+PARAMETER stop "<|im_end|>"
+# Quality parameters
+PARAMETER temperature 0.3
+PARAMETER top_p 0.95
+PARAMETER top_k 40
+PARAMETER repeat_penalty 1.1
+PARAMETER num_ctx 4096

README.md ADDED Viewed

	@@ -0,0 +1,229 @@

+---
+license: apache-2.0
+base_model: Sunbird/Sunflower-32B
+tags:
+- quantized
+- gguf
+- llama.cpp
+- ollama
+- ugandan-languages
+- translation
+- qwen
+library_name: transformers
+pipeline_tag: text-generation
+language:
+- en
+- lg
+---
+# Sunflower 32B - GGUF
+GGUF quantized versions of the Sunflower model for Ugandan language translation tasks.
+## Model Details
+- **Base Model**: [Sunbird/Sunflower-32B](https://huggingface.co/Sunbird/Sunflower-32B)
+- **Model Size**: 32B parameters
+- **Architecture**: Qwen2.5
+- **Quantization**: K-means quantization with importance matrix
+- **Languages**: English, Luganda, and other Ugandan languages
+## Available Files
+### Recommended Quantizations
+| Filename | Quant type | File Size | Description |
+| -------- | ---------- | --------- | ----------- |
+| sunflower-32B-f16.gguf | F16 | 62GB | Original precision |
+| sunflower-32B-q8_0.gguf | Q8_0 | 33GB | Highest quality quantized |
+| sunflower-32B-q6_k.gguf | Q6_K | 26GB | High quality |
+| sunflower-32B-q5_k_m.gguf | Q5_K_M | 22GB | Balanced quality/size |
+| sunflower-32B-q5_k_s.gguf | Q5_K_S | 19GB | Smaller Q5 variant |
+| sunflower-32B-q4_k_m.gguf | Q4_K_M | 17GB | **Recommended for most users** |
+### Warning: Experimental Quantizations
+The following quantizations achieve extreme compression but may significantly impact translation quality. Use for research and experimentation only.
+| Filename | Quant type | File Size | Compression | Warning |
+| -------- | ---------- | --------- | ----------- | ------- |
+| sunflower-32B-iq2_xxs.gguf | IQ2_XXS | 8.5GB | 85% smaller | May lose translation accuracy |
+| sunflower-32B-tq1_0.gguf | TQ1_0 | 7.2GB | 87% smaller | Experimental ternary quantization |
+| sunflower-32B-iq1_s.gguf | IQ1_S | 6.9GB | 88% smaller | **Extreme compression, quality heavily impacted** |
+**Note**: The experimental quantizations (IQ1_S, IQ2_XXS, TQ1_0) use advanced compression techniques that may not preserve the specialized knowledge for Ugandan language translation. Test thoroughly before production use.
+### Additional Files
+| Filename | Description |
+| -------- | ----------- |
+| sunflower-imatrix.dat | Importance matrix data used for quantization |
+## Usage
+### llama.cpp
+```bash
+# Download model
+huggingface-cli download Sunbird/Sunflower-32B-GGUF sunflower-32B-q4_k_m.gguf --local-dir .
+# Run inference
+./llama-cli -m sunflower-32B-q4_k_m.gguf -p "Translate to Luganda: Hello, how are you today?"
+```
+## Ollama Integration
+Ollama provides an easy way to run your quantized models locally with a simple API interface.
+### Installation and Setup
+```bash
+# Install Ollama (Linux/macOS)
+curl -fsSL https://ollama.ai/install.sh | sh
+# Or download from https://ollama.ai for Windows
+# Start Ollama service (runs in background)
+ollama serve
+```
+### Creating Modelfiles for Different Quantizations
+**Q4_K_M (Recommended) - Modelfile:**
+```bash
+cat > Modelfile.q4 << 'EOF'
+FROM ./gguf_outputs/model-q4_k_m.gguf
+# System prompt for your specific use case
+SYSTEM """You are a linguist and translator specializing in Ugandan languages, made by Sunbird AI."""
+# Chat template (adjust for your base model architecture)
+TEMPLATE """<|im_start|>system
+{{ .System }}<|im_end|>
+<|im_start|>user
+{{ .Prompt }}<|im_end|>
+<|im_start|>assistant
+{{ .Response }}<|im_end|>"""
+# Stop tokens
+PARAMETER stop "<|im_start|>"
+PARAMETER stop "<|im_end|>"
+# Generation parameters
+PARAMETER temperature 0.3
+PARAMETER top_p 0.95
+PARAMETER top_k 40
+PARAMETER repeat_penalty 1.1
+PARAMETER num_ctx 4096
+PARAMETER num_predict 500
+EOF
+```
+**Experimental IQ1_S - Modelfile:**
+```bash
+cat > Modelfile.iq1s << 'EOF'
+FROM ./gguf_outputs/model-iq1_s.gguf
+SYSTEM """You are a translator for Ugandan languages. Note: This is an experimental ultra-compressed model - quality may be limited."""
+# Same template and parameters as above
+TEMPLATE """<|im_start|>system
+{{ .System }}<|im_end|>
+<|im_start|>user
+{{ .Prompt }}<|im_end|>
+<|im_start|>assistant
+{{ .Response }}<|im_end|>"""
+PARAMETER stop "<|im_start|>"
+PARAMETER stop "<|im_end|>"
+PARAMETER temperature 0.3
+PARAMETER top_p 0.95
+PARAMETER num_ctx 2048  # Smaller context for experimental model
+EOF
+```
+### Importing Models to Ollama
+```bash
+# Import Q4_K_M model (recommended)
+ollama create sunflower-32B:q4 -f Modelfile.q4
+# Import experimental IQ1_S model
+ollama create sunflower-32B:iq1s -f Modelfile.iq1s
+# Import other quantizations
+ollama create sunflower-32B:q5 -f Modelfile.q5
+ollama create sunflower-32B:q6 -f Modelfile.q6
+# Verify models are imported
+ollama list
+```
+**Expected output:**
+```
+NAME                    ID              SIZE    MODIFIED
+sunflower-32B:q4        abc123def       17GB   2 minutes ago
+sunflower-32B:iq1s      def456ghi       6.2GB   1 minute ago
+```
+### Using Ollama Models
+**Interactive Chat:**
+```bash
+# Start interactive session with Q4 model
+ollama run sunflower-32B:q4
+# Example conversation:
+# >>> Translate to Luganda: Hello, how are you today?
+# >>> Give a dictionary definition of the Samia term "ovulwaye" in English
+# >>> /bye (to exit)
+# Start with experimental model
+ollama run sunflower-32B:iq1s
+```
+**Single Prompt Inference:**
+```bash
+# Quick translation with Q4 model
+ollama run sunflower-32B:q4 "Translate to Luganda: People in villages rarely accept new technologies."
+# Test experimental model
+ollama run sunflower-32B:iq1s "Translate to Luganda: Good morning"
+# Dictionary definition
+ollama run sunflower-32B:q4 'Give a dictionary definition of the Samia term "ovulwaye" in English'
+```
+### Ollama API Usage
+**Start API Server:**
+```bash
+# Ollama automatically serves API on http://localhost:11434
+# Test API endpoint
+curl http://localhost:11434/api/version
+```
+### Python (llama-cpp-python)
+```python
+from llama_cpp import Llama
+llm = Llama(model_path="sunflower-32B-q4_k_m.gguf")
+result = llm("Translate to Luganda: How are you?")
+print(result['choices'][0]['text'])
+```
+## Performance Notes
+- **Q4_K_M**: Recommended for most use cases
+- **Q5_K_M**: Better quality with moderate size increase
+- **Q6_K**: High quality for production use
+- **Q8_0**: Near-lossless quality
+## Technical Details
+Quantized using llama.cpp with importance matrix calibration for optimal quality preservation.
+## License
+Apache 2.0

sunflower-32B-f16.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f3bf11bf1a48d518c87c3470befa60da2938425248cd01b0d7355ac0d45cb81
+size 65531574880

sunflower-32B-iq1_s.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:855902c52f3751e8e7213ba2d4afa4ba1730cdfd0f8765892e7eb79ad29db26e
+size 7323841344

sunflower-32B-iq2_xxs.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6d177208ccd8e903dc45eae108ad7f0c08ee0df1afcd43e2430d617787668ba3
+size 9019913024

sunflower-32B-q4_k_m.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d98ed70da6013450d481e97294f1f9b9a1ee2f31681db372977e821b5a62ecc7
+size 19762149184

sunflower-32B-q5_k_m.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4da275836c43b8c953dfe67b69ce5ec4b57ae4c155c2d2462a7cd777967801d6
+size 23214831424

sunflower-32B-q6_k.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:921283b7f6888ef90668109cc25d3ef5df64fc6fc97a617e22b3e6cd61c27ac7
+size 26883306304

sunflower-32B-q8_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:08e6fd7d168fa19792dd73d5970ea382f234d99bac7602acf180cc53bdd2a079
+size 34817719104

sunflower-32B-tq1_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7aef9f35a47f64c19f7c9312bbfd5a169ffcd42ec05251e576e662415170eeac
+size 7666825024

sunflower-imatrix.dat ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2e2039b5eddef2933fcfb8258bb12478c25157142e49861e301adbd3c147947b
+size 15273216