cmac86 commited on
Commit
d6ede03
Β·
verified Β·
1 Parent(s): 197df07

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +142 -0
README.md ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: mistralai/Ministral-3-8B-Instruct-2512
4
+ tags:
5
+ - mistral
6
+ - tool-calling
7
+ - voice-assistant
8
+ - gguf
9
+ - lora
10
+ language:
11
+ - en
12
+ pipeline_tag: text-generation
13
+ ---
14
+
15
+ # CAAL Ministral - Fine-tuned for Tool Calling
16
+
17
+ Fine-tuned [Ministral-3-8B](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) for accurate tool calling in CAAL voice assistant.
18
+
19
+ ## Results
20
+
21
+ - βœ… **100% tool-calling accuracy** (15/15 validation cases)
22
+ - βœ… **0% hallucinated answers**
23
+ - βœ… Matches 14b performance at 8b speed
24
+ - βœ… 5.2GB Q4_K_M quantization
25
+
26
+ ## Quick Start (Ollama)
27
+
28
+ ```bash
29
+ # Download model
30
+ huggingface-cli download CoreWorxLab/caal-ministral \
31
+ caal-ministral-Q4_K_M.gguf \
32
+ --local-dir .
33
+
34
+ # Create Modelfile
35
+ cat > Modelfile << 'MODELFILE'
36
+ FROM ./caal-ministral-Q4_K_M.gguf
37
+
38
+ PARSER ministral
39
+ PARAMETER temperature 0.1
40
+ PARAMETER num_ctx 4096
41
+
42
+ SYSTEM """You are CAAL, a witty, action-oriented voice assistant."""
43
+ MODELFILE
44
+
45
+ # Import to Ollama
46
+ ollama create caal-ministral -f Modelfile
47
+
48
+ # Test
49
+ ollama run caal-ministral
50
+ ```
51
+
52
+ ## Training Details
53
+
54
+ - **Base Model:** Ministral-3-8B-Instruct-2512 (4-bit)
55
+ - **Method:** LoRA (r=16, alpha=16)
56
+ - **Dataset:** 2,776 examples (tool calls, general knowledge, web search)
57
+ - **Tool Format:** REST-style with action parameter (e.g., `espn_epl(action="scores")`)
58
+ - **Training:** 3 epochs on RTX 3060 12GB
59
+ - **Final Loss:** 0.126
60
+
61
+ ## Performance Comparison
62
+
63
+ | Metric | Base 8B | Base 14B | Fine-tuned 8B |
64
+ |--------|---------|----------|---------------|
65
+ | Tool calling accuracy | ~80% | ~100% | **100%** |
66
+ | Hallucinated answers | ~20% | ~0% | **0%** |
67
+ | Speed | Fast | Slow | **Fast** |
68
+ | VRAM (with TTS) | 6GB | 14GB | **6GB** |
69
+
70
+ ## Use Cases
71
+
72
+ Voice assistant tool calling:
73
+ - Smart home control (Home Assistant, TrueNAS)
74
+ - Calendar/task management (Google, Notion)
75
+ - Sports scores and schedules (ESPN)
76
+ - Server status monitoring
77
+ - Web search for current events
78
+
79
+ ## Validation Examples
80
+
81
+ **Successful tool calls (REST-style with action parameter):**
82
+ - "when is the next f1 race" β†’ `espn_f1(action="schedule")`
83
+ - "check my truenas status" β†’ `truenas(action="status")`
84
+ - "add a notion task to pack my bag tomorrow" β†’ `notion(action="add", task="pack my bag", due="tomorrow")`
85
+ - "Premier League scores" β†’ `espn_epl(action="scores")`
86
+
87
+ **General knowledge (no tool):**
88
+ - "what's the capital of France" β†’ "Paris"
89
+
90
+ **Web search:**
91
+ - "Who is playing at the 2026 half-time show?" β†’ `web_search(query="2026 Super Bowl halftime show lineup")`
92
+
93
+ ## Quantization Path
94
+
95
+ ```
96
+ Training: 4-bit bnb (fits 12GB VRAM)
97
+ ↓
98
+ Export: LoRA β†’ GGUF
99
+ ↓
100
+ Merge: Q4_K_M base + LoRA β†’ F16
101
+ ↓
102
+ Quantize: F16 β†’ Q4_K_M (single clean quantization)
103
+ ```
104
+
105
+ ## Limitations
106
+
107
+ - Trained on REST-style tool format with action parameters
108
+ - Requires proper tool descriptions in system prompt
109
+ - Low temperature (0.1) recommended for deterministic behavior
110
+ - Designed for voice assistant use cases
111
+
112
+ ## Hardware Requirements
113
+
114
+ **Inference:**
115
+ - GPU: 6GB VRAM (runs alongside Kokoro TTS on 12GB card)
116
+ - CPU: Compatible but slower
117
+ - RAM: 8GB minimum
118
+
119
+ ## License
120
+
121
+ Apache 2.0 (matches base model)
122
+
123
+ ## Citation
124
+
125
+ ```bibtex
126
+ @misc{caal-ministral-2026,
127
+ author = {CoreWorxLab},
128
+ title = {CAAL Ministral: Fine-tuned Tool Calling Model},
129
+ year = {2026},
130
+ publisher = {Hugging Face},
131
+ url = {https://huggingface.co/CoreWorxLab/caal-ministral}
132
+ }
133
+ ```
134
+
135
+ ## Links
136
+
137
+ - [Base Model](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512)
138
+ - [CAAL Project](https://github.com/CoreWorxLab/caal)
139
+
140
+ ## Acknowledgments
141
+
142
+ Trained using [Unsloth](https://github.com/unslothai/unsloth) for efficient LoRA fine-tuning.