---
license: apache-2.0
base_model: mistralai/Ministral-3-8B-Instruct-2512
tags:
- mistral
- tool-calling
- voice-assistant
- gguf
- lora
language:
- en
pipeline_tag: text-generation
---

# CAAL Ministral - Fine-tuned for Tool Calling

Fine-tuned [Ministral-3-8B](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) for accurate tool calling in CAAL voice assistant.

## Results

- ✅ **100% tool-calling accuracy** (15/15 validation cases)
- ✅ **0% hallucinated answers**
- ✅ Matches 14b performance at 8b speed
- ✅ 5.2GB Q4_K_M quantization

## Quick Start (Ollama)

```bash
# Download model
huggingface-cli download CoreWorxLab/caal-ministral \
  caal-ministral.gguf \
  --local-dir .

# Create Modelfile
cat > Modelfile << 'MODELFILE'
FROM ./caal-ministral.gguf

PARSER ministral
PARAMETER temperature 0.1
PARAMETER num_ctx 4096

SYSTEM """You are CAAL, a witty, action-oriented voice assistant."""
MODELFILE

# Import to Ollama
ollama create caal-ministral -f Modelfile

# Test
ollama run caal-ministral
```

## Training Details

- **Base Model:** Ministral-3-8B-Instruct-2512 (4-bit)
- **Method:** LoRA (r=16, alpha=16)
- **Dataset:** 2,776 examples (tool calls, general knowledge, web search)
- **Tool Format:** REST-style with action parameter (e.g., `espn_epl(action="scores")`)
- **Training:** 3 epochs on RTX 3060 12GB
- **Final Loss:** 0.126

## Performance Comparison

| Metric | Base 8B | Base 14B | Fine-tuned 8B |
|--------|---------|----------|---------------|
| Tool calling accuracy | ~80% | ~100% | **100%** |
| Hallucinated answers | ~20% | ~0% | **0%** |
| Speed | Fast | Slow | **Fast** |
| VRAM (with TTS) | 6GB | 14GB | **6GB** |

## Use Cases

Voice assistant tool calling:
- Smart home control (Home Assistant, TrueNAS)
- Calendar/task management (Google, Notion)
- Sports scores and schedules (ESPN)
- Server status monitoring
- Web search for current events

## Validation Examples

**Successful tool calls (REST-style with action parameter):**
- "when is the next f1 race" → `espn_f1(action="schedule")`
- "check my truenas status" → `truenas(action="status")`
- "add a notion task to pack my bag tomorrow" → `notion(action="add", task="pack my bag", due="tomorrow")`
- "Premier League scores" → `espn_epl(action="scores")`

**General knowledge (no tool):**
- "what's the capital of France" → "Paris"

**Web search:**
- "Who is playing at the 2026 half-time show?" → `web_search(query="2026 Super Bowl halftime show lineup")`

## Quantization Path

```
Training:   4-bit bnb (fits 12GB VRAM)
            ↓
Export:     LoRA → GGUF
            ↓
Merge:      Q4_K_M base + LoRA → F16
            ↓
Quantize:   F16 → Q4_K_M (single clean quantization)
```

## Limitations

- Trained on REST-style tool format with action parameters
- Requires proper tool descriptions in system prompt
- Low temperature (0.1) recommended for deterministic behavior
- Designed for voice assistant use cases

## Hardware Requirements

**Inference:**
- GPU: 6GB VRAM (runs alongside Kokoro TTS on 12GB card)
- CPU: Compatible but slower
- RAM: 8GB minimum

## License

Apache 2.0 (matches base model)

## Citation

```bibtex
@misc{caal-ministral-2026,
  author = {CoreWorxLab},
  title = {CAAL Ministral: Fine-tuned Tool Calling Model},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/CoreWorxLab/caal-ministral}
}
```

## Links

- [Base Model](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512)
- [CAAL Project](https://github.com/CoreWorxLab/caal)

## Acknowledgments

Trained using [Unsloth](https://github.com/unslothai/unsloth) for efficient LoRA fine-tuning.