caal-ministral / README.md

cmac86

Update README for renamed gguf file

ca385d6 verified 10 days ago

preview code

raw

history blame contribute delete

3.66 kB

metadata

license: apache-2.0
base_model: mistralai/Ministral-3-8B-Instruct-2512
tags:
  - mistral
  - tool-calling
  - voice-assistant
  - gguf
  - lora
language:
  - en
pipeline_tag: text-generation

CAAL Ministral - Fine-tuned for Tool Calling

Fine-tuned Ministral-3-8B for accurate tool calling in CAAL voice assistant.

Results

✅ 100% tool-calling accuracy (15/15 validation cases)
✅ 0% hallucinated answers
✅ Matches 14b performance at 8b speed
✅ 5.2GB Q4_K_M quantization

Quick Start (Ollama)

# Download model
huggingface-cli download CoreWorxLab/caal-ministral \
  caal-ministral.gguf \
  --local-dir .

# Create Modelfile
cat > Modelfile << 'MODELFILE'
FROM ./caal-ministral.gguf

PARSER ministral
PARAMETER temperature 0.1
PARAMETER num_ctx 4096

SYSTEM """You are CAAL, a witty, action-oriented voice assistant."""
MODELFILE

# Import to Ollama
ollama create caal-ministral -f Modelfile

# Test
ollama run caal-ministral

Training Details

Base Model: Ministral-3-8B-Instruct-2512 (4-bit)
Method: LoRA (r=16, alpha=16)
Dataset: 2,776 examples (tool calls, general knowledge, web search)
Tool Format: REST-style with action parameter (e.g., espn_epl(action="scores"))
Training: 3 epochs on RTX 3060 12GB
Final Loss: 0.126

Performance Comparison

Metric	Base 8B	Base 14B	Fine-tuned 8B
Tool calling accuracy	~80%	~100%	100%
Hallucinated answers	~20%	~0%	0%
Speed	Fast	Slow	Fast
VRAM (with TTS)	6GB	14GB	6GB

Use Cases

Voice assistant tool calling:

Smart home control (Home Assistant, TrueNAS)
Calendar/task management (Google, Notion)
Sports scores and schedules (ESPN)
Server status monitoring
Web search for current events

Validation Examples

Successful tool calls (REST-style with action parameter):

"when is the next f1 race" → espn_f1(action="schedule")
"check my truenas status" → truenas(action="status")
"add a notion task to pack my bag tomorrow" → notion(action="add", task="pack my bag", due="tomorrow")
"Premier League scores" → espn_epl(action="scores")

General knowledge (no tool):

"what's the capital of France" → "Paris"

Web search:

"Who is playing at the 2026 half-time show?" → web_search(query="2026 Super Bowl halftime show lineup")

Quantization Path

Training:   4-bit bnb (fits 12GB VRAM)
            ↓
Export:     LoRA → GGUF
            ↓
Merge:      Q4_K_M base + LoRA → F16
            ↓
Quantize:   F16 → Q4_K_M (single clean quantization)

Limitations

Trained on REST-style tool format with action parameters
Requires proper tool descriptions in system prompt
Low temperature (0.1) recommended for deterministic behavior
Designed for voice assistant use cases

Hardware Requirements

Inference:

GPU: 6GB VRAM (runs alongside Kokoro TTS on 12GB card)
CPU: Compatible but slower
RAM: 8GB minimum

License

Apache 2.0 (matches base model)

Citation

@misc{caal-ministral-2026,
  author = {CoreWorxLab},
  title = {CAAL Ministral: Fine-tuned Tool Calling Model},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/CoreWorxLab/caal-ministral}
}

Acknowledgments

Trained using Unsloth for efficient LoRA fine-tuning.

CoreWorxLab
/

caal-ministral