metadata
license: apache-2.0
base_model: mistralai/Ministral-3-8B-Instruct-2512
tags:
- mistral
- tool-calling
- voice-assistant
- gguf
- lora
language:
- en
pipeline_tag: text-generation
CAAL Ministral - Fine-tuned for Tool Calling
Fine-tuned Ministral-3-8B for accurate tool calling in CAAL voice assistant.
Results
- β 100% tool-calling accuracy (15/15 validation cases)
- β 0% hallucinated answers
- β Matches 14b performance at 8b speed
- β 5.2GB Q4_K_M quantization
Quick Start (Ollama)
# Download model
huggingface-cli download CoreWorxLab/caal-ministral \
caal-ministral.gguf \
--local-dir .
# Create Modelfile
cat > Modelfile << 'MODELFILE'
FROM ./caal-ministral.gguf
PARSER ministral
PARAMETER temperature 0.1
PARAMETER num_ctx 4096
SYSTEM """You are CAAL, a witty, action-oriented voice assistant."""
MODELFILE
# Import to Ollama
ollama create caal-ministral -f Modelfile
# Test
ollama run caal-ministral
Training Details
- Base Model: Ministral-3-8B-Instruct-2512 (4-bit)
- Method: LoRA (r=16, alpha=16)
- Dataset: 2,776 examples (tool calls, general knowledge, web search)
- Tool Format: REST-style with action parameter (e.g.,
espn_epl(action="scores")) - Training: 3 epochs on RTX 3060 12GB
- Final Loss: 0.126
Performance Comparison
| Metric | Base 8B | Base 14B | Fine-tuned 8B |
|---|---|---|---|
| Tool calling accuracy | ~80% | ~100% | 100% |
| Hallucinated answers | ~20% | ~0% | 0% |
| Speed | Fast | Slow | Fast |
| VRAM (with TTS) | 6GB | 14GB | 6GB |
Use Cases
Voice assistant tool calling:
- Smart home control (Home Assistant, TrueNAS)
- Calendar/task management (Google, Notion)
- Sports scores and schedules (ESPN)
- Server status monitoring
- Web search for current events
Validation Examples
Successful tool calls (REST-style with action parameter):
- "when is the next f1 race" β
espn_f1(action="schedule") - "check my truenas status" β
truenas(action="status") - "add a notion task to pack my bag tomorrow" β
notion(action="add", task="pack my bag", due="tomorrow") - "Premier League scores" β
espn_epl(action="scores")
General knowledge (no tool):
- "what's the capital of France" β "Paris"
Web search:
- "Who is playing at the 2026 half-time show?" β
web_search(query="2026 Super Bowl halftime show lineup")
Quantization Path
Training: 4-bit bnb (fits 12GB VRAM)
β
Export: LoRA β GGUF
β
Merge: Q4_K_M base + LoRA β F16
β
Quantize: F16 β Q4_K_M (single clean quantization)
Limitations
- Trained on REST-style tool format with action parameters
- Requires proper tool descriptions in system prompt
- Low temperature (0.1) recommended for deterministic behavior
- Designed for voice assistant use cases
Hardware Requirements
Inference:
- GPU: 6GB VRAM (runs alongside Kokoro TTS on 12GB card)
- CPU: Compatible but slower
- RAM: 8GB minimum
License
Apache 2.0 (matches base model)
Citation
@misc{caal-ministral-2026,
author = {CoreWorxLab},
title = {CAAL Ministral: Fine-tuned Tool Calling Model},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/CoreWorxLab/caal-ministral}
}
Links
Acknowledgments
Trained using Unsloth for efficient LoRA fine-tuning.