CAAL Ministral - Fine-tuned for Tool Calling

Fine-tuned Ministral-3-8B for accurate tool calling in CAAL voice assistant.

Results

  • 100% tool-calling accuracy (15/15 validation cases)
  • 0% hallucinated answers
  • ✅ Matches 14b performance at 8b speed
  • ✅ 5.2GB Q4_K_M quantization

Quick Start (Ollama)

# Download model
huggingface-cli download CoreWorxLab/caal-ministral \
  caal-ministral.gguf \
  --local-dir .

# Create Modelfile
cat > Modelfile << 'MODELFILE'
FROM ./caal-ministral.gguf

PARSER ministral
PARAMETER temperature 0.1
PARAMETER num_ctx 4096

SYSTEM """You are CAAL, a witty, action-oriented voice assistant."""
MODELFILE

# Import to Ollama
ollama create caal-ministral -f Modelfile

# Test
ollama run caal-ministral

Training Details

  • Base Model: Ministral-3-8B-Instruct-2512 (4-bit)
  • Method: LoRA (r=16, alpha=16)
  • Dataset: 2,776 examples (tool calls, general knowledge, web search)
  • Tool Format: REST-style with action parameter (e.g., espn_epl(action="scores"))
  • Training: 3 epochs on RTX 3060 12GB
  • Final Loss: 0.126

Performance Comparison

Metric Base 8B Base 14B Fine-tuned 8B
Tool calling accuracy ~80% ~100% 100%
Hallucinated answers ~20% ~0% 0%
Speed Fast Slow Fast
VRAM (with TTS) 6GB 14GB 6GB

Use Cases

Voice assistant tool calling:

  • Smart home control (Home Assistant, TrueNAS)
  • Calendar/task management (Google, Notion)
  • Sports scores and schedules (ESPN)
  • Server status monitoring
  • Web search for current events

Validation Examples

Successful tool calls (REST-style with action parameter):

  • "when is the next f1 race" → espn_f1(action="schedule")
  • "check my truenas status" → truenas(action="status")
  • "add a notion task to pack my bag tomorrow" → notion(action="add", task="pack my bag", due="tomorrow")
  • "Premier League scores" → espn_epl(action="scores")

General knowledge (no tool):

  • "what's the capital of France" → "Paris"

Web search:

  • "Who is playing at the 2026 half-time show?" → web_search(query="2026 Super Bowl halftime show lineup")

Quantization Path

Training:   4-bit bnb (fits 12GB VRAM)
            ↓
Export:     LoRA → GGUF
            ↓
Merge:      Q4_K_M base + LoRA → F16
            ↓
Quantize:   F16 → Q4_K_M (single clean quantization)

Limitations

  • Trained on REST-style tool format with action parameters
  • Requires proper tool descriptions in system prompt
  • Low temperature (0.1) recommended for deterministic behavior
  • Designed for voice assistant use cases

Hardware Requirements

Inference:

  • GPU: 6GB VRAM (runs alongside Kokoro TTS on 12GB card)
  • CPU: Compatible but slower
  • RAM: 8GB minimum

License

Apache 2.0 (matches base model)

Citation

@misc{caal-ministral-2026,
  author = {CoreWorxLab},
  title = {CAAL Ministral: Fine-tuned Tool Calling Model},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/CoreWorxLab/caal-ministral}
}

Links

Acknowledgments

Trained using Unsloth for efficient LoRA fine-tuning.

Downloads last month
35
GGUF
Model size
8B params
Architecture
mistral3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CoreWorxLab/caal-ministral