caal-ministral / README.md
cmac86's picture
Update README for renamed gguf file
ca385d6 verified
---
license: apache-2.0
base_model: mistralai/Ministral-3-8B-Instruct-2512
tags:
- mistral
- tool-calling
- voice-assistant
- gguf
- lora
language:
- en
pipeline_tag: text-generation
---
# CAAL Ministral - Fine-tuned for Tool Calling
Fine-tuned [Ministral-3-8B](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) for accurate tool calling in CAAL voice assistant.
## Results
- βœ… **100% tool-calling accuracy** (15/15 validation cases)
- βœ… **0% hallucinated answers**
- βœ… Matches 14b performance at 8b speed
- βœ… 5.2GB Q4_K_M quantization
## Quick Start (Ollama)
```bash
# Download model
huggingface-cli download CoreWorxLab/caal-ministral \
caal-ministral.gguf \
--local-dir .
# Create Modelfile
cat > Modelfile << 'MODELFILE'
FROM ./caal-ministral.gguf
PARSER ministral
PARAMETER temperature 0.1
PARAMETER num_ctx 4096
SYSTEM """You are CAAL, a witty, action-oriented voice assistant."""
MODELFILE
# Import to Ollama
ollama create caal-ministral -f Modelfile
# Test
ollama run caal-ministral
```
## Training Details
- **Base Model:** Ministral-3-8B-Instruct-2512 (4-bit)
- **Method:** LoRA (r=16, alpha=16)
- **Dataset:** 2,776 examples (tool calls, general knowledge, web search)
- **Tool Format:** REST-style with action parameter (e.g., `espn_epl(action="scores")`)
- **Training:** 3 epochs on RTX 3060 12GB
- **Final Loss:** 0.126
## Performance Comparison
| Metric | Base 8B | Base 14B | Fine-tuned 8B |
|--------|---------|----------|---------------|
| Tool calling accuracy | ~80% | ~100% | **100%** |
| Hallucinated answers | ~20% | ~0% | **0%** |
| Speed | Fast | Slow | **Fast** |
| VRAM (with TTS) | 6GB | 14GB | **6GB** |
## Use Cases
Voice assistant tool calling:
- Smart home control (Home Assistant, TrueNAS)
- Calendar/task management (Google, Notion)
- Sports scores and schedules (ESPN)
- Server status monitoring
- Web search for current events
## Validation Examples
**Successful tool calls (REST-style with action parameter):**
- "when is the next f1 race" β†’ `espn_f1(action="schedule")`
- "check my truenas status" β†’ `truenas(action="status")`
- "add a notion task to pack my bag tomorrow" β†’ `notion(action="add", task="pack my bag", due="tomorrow")`
- "Premier League scores" β†’ `espn_epl(action="scores")`
**General knowledge (no tool):**
- "what's the capital of France" β†’ "Paris"
**Web search:**
- "Who is playing at the 2026 half-time show?" β†’ `web_search(query="2026 Super Bowl halftime show lineup")`
## Quantization Path
```
Training: 4-bit bnb (fits 12GB VRAM)
↓
Export: LoRA β†’ GGUF
↓
Merge: Q4_K_M base + LoRA β†’ F16
↓
Quantize: F16 β†’ Q4_K_M (single clean quantization)
```
## Limitations
- Trained on REST-style tool format with action parameters
- Requires proper tool descriptions in system prompt
- Low temperature (0.1) recommended for deterministic behavior
- Designed for voice assistant use cases
## Hardware Requirements
**Inference:**
- GPU: 6GB VRAM (runs alongside Kokoro TTS on 12GB card)
- CPU: Compatible but slower
- RAM: 8GB minimum
## License
Apache 2.0 (matches base model)
## Citation
```bibtex
@misc{caal-ministral-2026,
author = {CoreWorxLab},
title = {CAAL Ministral: Fine-tuned Tool Calling Model},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/CoreWorxLab/caal-ministral}
}
```
## Links
- [Base Model](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512)
- [CAAL Project](https://github.com/CoreWorxLab/caal)
## Acknowledgments
Trained using [Unsloth](https://github.com/unslothai/unsloth) for efficient LoRA fine-tuning.