CAAL Ministral - Fine-tuned for Tool Calling
Fine-tuned Ministral-3-8B for accurate tool calling in CAAL voice assistant.
Results
- ✅ 100% tool-calling accuracy (15/15 validation cases)
- ✅ 0% hallucinated answers
- ✅ Matches 14b performance at 8b speed
- ✅ 5.2GB Q4_K_M quantization
Quick Start (Ollama)
# Download model
huggingface-cli download CoreWorxLab/caal-ministral \
caal-ministral.gguf \
--local-dir .
# Create Modelfile
cat > Modelfile << 'MODELFILE'
FROM ./caal-ministral.gguf
PARSER ministral
PARAMETER temperature 0.1
PARAMETER num_ctx 4096
SYSTEM """You are CAAL, a witty, action-oriented voice assistant."""
MODELFILE
# Import to Ollama
ollama create caal-ministral -f Modelfile
# Test
ollama run caal-ministral
Training Details
- Base Model: Ministral-3-8B-Instruct-2512 (4-bit)
- Method: LoRA (r=16, alpha=16)
- Dataset: 2,776 examples (tool calls, general knowledge, web search)
- Tool Format: REST-style with action parameter (e.g.,
espn_epl(action="scores")) - Training: 3 epochs on RTX 3060 12GB
- Final Loss: 0.126
Performance Comparison
| Metric | Base 8B | Base 14B | Fine-tuned 8B |
|---|---|---|---|
| Tool calling accuracy | ~80% | ~100% | 100% |
| Hallucinated answers | ~20% | ~0% | 0% |
| Speed | Fast | Slow | Fast |
| VRAM (with TTS) | 6GB | 14GB | 6GB |
Use Cases
Voice assistant tool calling:
- Smart home control (Home Assistant, TrueNAS)
- Calendar/task management (Google, Notion)
- Sports scores and schedules (ESPN)
- Server status monitoring
- Web search for current events
Validation Examples
Successful tool calls (REST-style with action parameter):
- "when is the next f1 race" →
espn_f1(action="schedule") - "check my truenas status" →
truenas(action="status") - "add a notion task to pack my bag tomorrow" →
notion(action="add", task="pack my bag", due="tomorrow") - "Premier League scores" →
espn_epl(action="scores")
General knowledge (no tool):
- "what's the capital of France" → "Paris"
Web search:
- "Who is playing at the 2026 half-time show?" →
web_search(query="2026 Super Bowl halftime show lineup")
Quantization Path
Training: 4-bit bnb (fits 12GB VRAM)
↓
Export: LoRA → GGUF
↓
Merge: Q4_K_M base + LoRA → F16
↓
Quantize: F16 → Q4_K_M (single clean quantization)
Limitations
- Trained on REST-style tool format with action parameters
- Requires proper tool descriptions in system prompt
- Low temperature (0.1) recommended for deterministic behavior
- Designed for voice assistant use cases
Hardware Requirements
Inference:
- GPU: 6GB VRAM (runs alongside Kokoro TTS on 12GB card)
- CPU: Compatible but slower
- RAM: 8GB minimum
License
Apache 2.0 (matches base model)
Citation
@misc{caal-ministral-2026,
author = {CoreWorxLab},
title = {CAAL Ministral: Fine-tuned Tool Calling Model},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/CoreWorxLab/caal-ministral}
}
Links
Acknowledgments
Trained using Unsloth for efficient LoRA fine-tuning.
- Downloads last month
- 35
Hardware compatibility
Log In
to add your hardware
We're not able to determine the quantization variants.
Model tree for CoreWorxLab/caal-ministral
Base model
mistralai/Ministral-3-8B-Base-2512
Quantized
mistralai/Ministral-3-8B-Instruct-2512