caal-ministral / README.md

Update README for renamed gguf file

ca385d6 verified 13 days ago

3.66 kB

	---
	license: apache-2.0
	base_model: mistralai/Ministral-3-8B-Instruct-2512
	tags:
	- mistral
	- tool-calling
	- voice-assistant
	- gguf
	- lora
	language:
	- en
	pipeline_tag: text-generation
	---

	# CAAL Ministral - Fine-tuned for Tool Calling

	Fine-tuned [Ministral-3-8B](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) for accurate tool calling in CAAL voice assistant.

	## Results

	- ✅ 100% tool-calling accuracy (15/15 validation cases)
	- ✅ 0% hallucinated answers
	- ✅ Matches 14b performance at 8b speed
	- ✅ 5.2GB Q4_K_M quantization

	## Quick Start (Ollama)

	```bash
	# Download model
	huggingface-cli download CoreWorxLab/caal-ministral \
	caal-ministral.gguf \
	--local-dir .

	# Create Modelfile
	cat > Modelfile << 'MODELFILE'
	FROM ./caal-ministral.gguf

	PARSER ministral
	PARAMETER temperature 0.1
	PARAMETER num_ctx 4096

	SYSTEM """You are CAAL, a witty, action-oriented voice assistant."""
	MODELFILE

	# Import to Ollama
	ollama create caal-ministral -f Modelfile

	# Test
	ollama run caal-ministral
	```

	## Training Details

	- Base Model: Ministral-3-8B-Instruct-2512 (4-bit)
	- Method: LoRA (r=16, alpha=16)
	- Dataset: 2,776 examples (tool calls, general knowledge, web search)
	- Tool Format: REST-style with action parameter (e.g., `espn_epl(action="scores")`)
	- Training: 3 epochs on RTX 3060 12GB
	- Final Loss: 0.126

	## Performance Comparison

	\| Metric \| Base 8B \| Base 14B \| Fine-tuned 8B \|
	\|--------\|---------\|----------\|---------------\|
	\| Tool calling accuracy \| ~80% \| ~100% \| 100% \|
	\| Hallucinated answers \| ~20% \| ~0% \| 0% \|
	\| Speed \| Fast \| Slow \| Fast \|
	\| VRAM (with TTS) \| 6GB \| 14GB \| 6GB \|

	## Use Cases

	Voice assistant tool calling:
	- Smart home control (Home Assistant, TrueNAS)
	- Calendar/task management (Google, Notion)
	- Sports scores and schedules (ESPN)
	- Server status monitoring
	- Web search for current events

	## Validation Examples

	Successful tool calls (REST-style with action parameter):
	- "when is the next f1 race" → `espn_f1(action="schedule")`
	- "check my truenas status" → `truenas(action="status")`
	- "add a notion task to pack my bag tomorrow" → `notion(action="add", task="pack my bag", due="tomorrow")`
	- "Premier League scores" → `espn_epl(action="scores")`

	General knowledge (no tool):
	- "what's the capital of France" → "Paris"

	Web search:
	- "Who is playing at the 2026 half-time show?" → `web_search(query="2026 Super Bowl halftime show lineup")`

	## Quantization Path

	```
	Training: 4-bit bnb (fits 12GB VRAM)
	↓
	Export: LoRA → GGUF
	↓
	Merge: Q4_K_M base + LoRA → F16
	↓
	Quantize: F16 → Q4_K_M (single clean quantization)
	```

	## Limitations

	- Trained on REST-style tool format with action parameters
	- Requires proper tool descriptions in system prompt
	- Low temperature (0.1) recommended for deterministic behavior
	- Designed for voice assistant use cases

	## Hardware Requirements

	Inference:
	- GPU: 6GB VRAM (runs alongside Kokoro TTS on 12GB card)
	- CPU: Compatible but slower
	- RAM: 8GB minimum

	## License

	Apache 2.0 (matches base model)

	## Citation

	```bibtex
	@misc{caal-ministral-2026,
	author = {CoreWorxLab},
	title = {CAAL Ministral: Fine-tuned Tool Calling Model},
	year = {2026},
	publisher = {Hugging Face},
	url = {https://huggingface.co/CoreWorxLab/caal-ministral}
	}
	```

	## Links

	- [Base Model](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512)
	- [CAAL Project](https://github.com/CoreWorxLab/caal)

	## Acknowledgments

	Trained using [Unsloth](https://github.com/unslothai/unsloth) for efficient LoRA fine-tuning.