--- license: apache-2.0 base_model: mistralai/Ministral-3-8B-Instruct-2512 tags: - mistral - tool-calling - voice-assistant - gguf - lora language: - en pipeline_tag: text-generation --- # CAAL Ministral - Fine-tuned for Tool Calling Fine-tuned [Ministral-3-8B](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) for accurate tool calling in CAAL voice assistant. ## Results - ✅ **100% tool-calling accuracy** (15/15 validation cases) - ✅ **0% hallucinated answers** - ✅ Matches 14b performance at 8b speed - ✅ 5.2GB Q4_K_M quantization ## Quick Start (Ollama) ```bash # Download model huggingface-cli download CoreWorxLab/caal-ministral \ caal-ministral.gguf \ --local-dir . # Create Modelfile cat > Modelfile << 'MODELFILE' FROM ./caal-ministral.gguf PARSER ministral PARAMETER temperature 0.1 PARAMETER num_ctx 4096 SYSTEM """You are CAAL, a witty, action-oriented voice assistant.""" MODELFILE # Import to Ollama ollama create caal-ministral -f Modelfile # Test ollama run caal-ministral ``` ## Training Details - **Base Model:** Ministral-3-8B-Instruct-2512 (4-bit) - **Method:** LoRA (r=16, alpha=16) - **Dataset:** 2,776 examples (tool calls, general knowledge, web search) - **Tool Format:** REST-style with action parameter (e.g., `espn_epl(action="scores")`) - **Training:** 3 epochs on RTX 3060 12GB - **Final Loss:** 0.126 ## Performance Comparison | Metric | Base 8B | Base 14B | Fine-tuned 8B | |--------|---------|----------|---------------| | Tool calling accuracy | ~80% | ~100% | **100%** | | Hallucinated answers | ~20% | ~0% | **0%** | | Speed | Fast | Slow | **Fast** | | VRAM (with TTS) | 6GB | 14GB | **6GB** | ## Use Cases Voice assistant tool calling: - Smart home control (Home Assistant, TrueNAS) - Calendar/task management (Google, Notion) - Sports scores and schedules (ESPN) - Server status monitoring - Web search for current events ## Validation Examples **Successful tool calls (REST-style with action parameter):** - "when is the next f1 race" → `espn_f1(action="schedule")` - "check my truenas status" → `truenas(action="status")` - "add a notion task to pack my bag tomorrow" → `notion(action="add", task="pack my bag", due="tomorrow")` - "Premier League scores" → `espn_epl(action="scores")` **General knowledge (no tool):** - "what's the capital of France" → "Paris" **Web search:** - "Who is playing at the 2026 half-time show?" → `web_search(query="2026 Super Bowl halftime show lineup")` ## Quantization Path ``` Training: 4-bit bnb (fits 12GB VRAM) ↓ Export: LoRA → GGUF ↓ Merge: Q4_K_M base + LoRA → F16 ↓ Quantize: F16 → Q4_K_M (single clean quantization) ``` ## Limitations - Trained on REST-style tool format with action parameters - Requires proper tool descriptions in system prompt - Low temperature (0.1) recommended for deterministic behavior - Designed for voice assistant use cases ## Hardware Requirements **Inference:** - GPU: 6GB VRAM (runs alongside Kokoro TTS on 12GB card) - CPU: Compatible but slower - RAM: 8GB minimum ## License Apache 2.0 (matches base model) ## Citation ```bibtex @misc{caal-ministral-2026, author = {CoreWorxLab}, title = {CAAL Ministral: Fine-tuned Tool Calling Model}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/CoreWorxLab/caal-ministral} } ``` ## Links - [Base Model](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) - [CAAL Project](https://github.com/CoreWorxLab/caal) ## Acknowledgments Trained using [Unsloth](https://github.com/unslothai/unsloth) for efficient LoRA fine-tuning.