caal-ministral / README.md
cmac86's picture
Update README for renamed gguf file
ca385d6 verified
metadata
license: apache-2.0
base_model: mistralai/Ministral-3-8B-Instruct-2512
tags:
  - mistral
  - tool-calling
  - voice-assistant
  - gguf
  - lora
language:
  - en
pipeline_tag: text-generation

CAAL Ministral - Fine-tuned for Tool Calling

Fine-tuned Ministral-3-8B for accurate tool calling in CAAL voice assistant.

Results

  • βœ… 100% tool-calling accuracy (15/15 validation cases)
  • βœ… 0% hallucinated answers
  • βœ… Matches 14b performance at 8b speed
  • βœ… 5.2GB Q4_K_M quantization

Quick Start (Ollama)

# Download model
huggingface-cli download CoreWorxLab/caal-ministral \
  caal-ministral.gguf \
  --local-dir .

# Create Modelfile
cat > Modelfile << 'MODELFILE'
FROM ./caal-ministral.gguf

PARSER ministral
PARAMETER temperature 0.1
PARAMETER num_ctx 4096

SYSTEM """You are CAAL, a witty, action-oriented voice assistant."""
MODELFILE

# Import to Ollama
ollama create caal-ministral -f Modelfile

# Test
ollama run caal-ministral

Training Details

  • Base Model: Ministral-3-8B-Instruct-2512 (4-bit)
  • Method: LoRA (r=16, alpha=16)
  • Dataset: 2,776 examples (tool calls, general knowledge, web search)
  • Tool Format: REST-style with action parameter (e.g., espn_epl(action="scores"))
  • Training: 3 epochs on RTX 3060 12GB
  • Final Loss: 0.126

Performance Comparison

Metric Base 8B Base 14B Fine-tuned 8B
Tool calling accuracy ~80% ~100% 100%
Hallucinated answers ~20% ~0% 0%
Speed Fast Slow Fast
VRAM (with TTS) 6GB 14GB 6GB

Use Cases

Voice assistant tool calling:

  • Smart home control (Home Assistant, TrueNAS)
  • Calendar/task management (Google, Notion)
  • Sports scores and schedules (ESPN)
  • Server status monitoring
  • Web search for current events

Validation Examples

Successful tool calls (REST-style with action parameter):

  • "when is the next f1 race" β†’ espn_f1(action="schedule")
  • "check my truenas status" β†’ truenas(action="status")
  • "add a notion task to pack my bag tomorrow" β†’ notion(action="add", task="pack my bag", due="tomorrow")
  • "Premier League scores" β†’ espn_epl(action="scores")

General knowledge (no tool):

  • "what's the capital of France" β†’ "Paris"

Web search:

  • "Who is playing at the 2026 half-time show?" β†’ web_search(query="2026 Super Bowl halftime show lineup")

Quantization Path

Training:   4-bit bnb (fits 12GB VRAM)
            ↓
Export:     LoRA β†’ GGUF
            ↓
Merge:      Q4_K_M base + LoRA β†’ F16
            ↓
Quantize:   F16 β†’ Q4_K_M (single clean quantization)

Limitations

  • Trained on REST-style tool format with action parameters
  • Requires proper tool descriptions in system prompt
  • Low temperature (0.1) recommended for deterministic behavior
  • Designed for voice assistant use cases

Hardware Requirements

Inference:

  • GPU: 6GB VRAM (runs alongside Kokoro TTS on 12GB card)
  • CPU: Compatible but slower
  • RAM: 8GB minimum

License

Apache 2.0 (matches base model)

Citation

@misc{caal-ministral-2026,
  author = {CoreWorxLab},
  title = {CAAL Ministral: Fine-tuned Tool Calling Model},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/CoreWorxLab/caal-ministral}
}

Links

Acknowledgments

Trained using Unsloth for efficient LoRA fine-tuning.