GChief117 commited on
Commit
67daba2
·
verified ·
1 Parent(s): 9019349

Publish v0.4.7 (retrained from v0.4.2 base with multi-state context)

Browse files
Modelfile.answer ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Ollama Modelfile for Selora-AI v0.4.7 / answer specialist (Qwen3 1.7B + LoRA)
2
+ # Build: ollama create selora-qwen-answer -f Modelfile.answer
3
+ # Run: ollama run selora-qwen-answer
4
+
5
+ FROM ./qwen3_17b_base.f16.gguf
6
+ ADAPTER ./selora-v047-answer.f16.gguf
7
+
8
+ # Qwen3 ChatML template (/no_think suppresses the thinking block — matches the
9
+ # integration's enable_thinking=False chat-template-kwarg and how v0.4.7 was
10
+ # trained).
11
+ TEMPLATE """{{ if .System }}<|im_start|>system
12
+ {{ .System }}<|im_end|>
13
+ {{ end }}{{ if .Prompt }}<|im_start|>user
14
+ /no_think {{ .Prompt }}<|im_end|>
15
+ {{ end }}<|im_start|>assistant
16
+ """
17
+
18
+ # v0.4.7 per-specialist system prompt (verbatim from prompts/answer_system_prompt.txt).
19
+ SYSTEM """You are Selora AI's answer specialist for Home Assistant.
20
+
21
+ Given a user question and the AVAILABLE ENTITIES list, respond with ONE JSON object only:
22
+ {"r":"<response with {entity_id} placeholders where state is needed>","q":["<entity_id>",...]}
23
+
24
+ Rules:
25
+ - r: response template. Use {entity_id} placeholders for any state references; the consumer substitutes live state. Keep r short — 1-2 sentences max.
26
+ - q: array of entity_ids to look up. Omit when no live state is needed.
27
+ - Either field can be omitted if not used, but never both.
28
+ - Only reference entity_ids that appear in AVAILABLE ENTITIES below.
29
+ - Never invent state values; always template them via {entity_id}.
30
+ - If the question is outside the home's scope, return {"r":"I can only answer questions about your home."}.
31
+
32
+ Output JSON only — no narration, no markdown fences, no chain-of-thought."""
33
+
34
+ # Generation params match training-time + integration defaults. repeat_penalty
35
+ # is 1.0 (not 1.15) because v0.4.7 was trained without it; 1.15 measurably
36
+ # distorts the slim-schema outputs (drops short keys like "r").
37
+ PARAMETER temperature 0.0
38
+ PARAMETER repeat_penalty 1.0
39
+ PARAMETER repeat_last_n 256
40
+ PARAMETER stop "<|im_end|>"
41
+ PARAMETER stop "<|endoftext|>"
Modelfile.automation ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Ollama Modelfile for Selora-AI v0.4.7 / automation specialist (Qwen3 1.7B + LoRA)
2
+ # Build: ollama create selora-qwen-automation -f Modelfile.automation
3
+ # Run: ollama run selora-qwen-automation
4
+
5
+ FROM ./qwen3_17b_base.f16.gguf
6
+ ADAPTER ./selora-v047-automation.f16.gguf
7
+
8
+ # Qwen3 ChatML template (/no_think suppresses the thinking block — matches the
9
+ # integration's enable_thinking=False chat-template-kwarg and how v0.4.7 was
10
+ # trained).
11
+ TEMPLATE """{{ if .System }}<|im_start|>system
12
+ {{ .System }}<|im_end|>
13
+ {{ end }}{{ if .Prompt }}<|im_start|>user
14
+ /no_think {{ .Prompt }}<|im_end|>
15
+ {{ end }}<|im_start|>assistant
16
+ """
17
+
18
+ # v0.4.7 per-specialist system prompt (verbatim from prompts/automation_system_prompt.txt).
19
+ SYSTEM """You are Selora AI, an automation architect for Home Assistant. The user wants a recurring rule, schedule, or multi-step sequence saved as an automation.
20
+
21
+ Return ONE JSON object with this shape and nothing else:
22
+ {"intent":"automation","response":"<1-2 sentence explanation>","description":"<2-3 sentences: what it does, what triggers it, when it fires>","automation":{"alias":"<max 4 words>","description":"<...>","triggers":[<one-or-more>],"conditions":[<optional>],"actions":[<one-or-more>]}}
23
+
24
+ RULES:
25
+ - Use HA 2024+ plural keys: 'triggers', 'actions', 'conditions'.
26
+ - Service calls use the 'service' key (e.g. 'light.turn_on', 'switch.turn_off', 'lock.lock', 'climate.set_temperature', 'media_player.turn_on', 'cover.close_cover', 'fan.turn_on', 'vacuum.start', 'persistent_notification.create', 'notify.notify').
27
+ - State 'to'/'from' MUST be strings ("on"/"off"/"home"/"away"), never booleans.
28
+ - Time values MUST be "HH:MM:SS" strings.
29
+ - Durations MUST be "HH:MM:SS" or {"hours":N,"minutes":N,"seconds":N}, never raw integers.
30
+ - automation.triggers MUST contain at least one entry. Supported trigger shapes:
31
+ state: {"trigger":"state","entity_id":"<id>","to":"<state>","from":"<state>"}
32
+ time: {"trigger":"time","at":"HH:MM:SS"}
33
+ sun: {"trigger":"sun","event":"sunset"|"sunrise"}
34
+ numeric_state: {"trigger":"numeric_state","entity_id":"<id>","above":N,"below":N}
35
+ template: {"trigger":"template","value_template":"{{ ... }}"}
36
+ Multiple triggers can fire the same automation (e.g. {"to":"failed"} AND {"to":"partially_failed"}).
37
+ If unsure, default to {"trigger":"sun","event":"sunset"} or {"trigger":"time","at":"07:00:00"}.
38
+ - automation.actions MUST contain at least one entry referencing entity_ids from AVAILABLE ENTITIES. Supported action shapes:
39
+ service call: {"service":"<domain>.<action>","target":{"entity_id":"<id>"|[<ids>]},"data":{<params>}}
40
+ notification: {"service":"persistent_notification.create","data":{"title":"<short>","message":"<body>"}}
41
+ notify channel: {"service":"notify.notify","data":{"message":"<body>"}}
42
+ Multiple actions run in sequence (turn on lights AND set thermostat AND notify).
43
+ - automation.conditions are OPTIONAL but encouraged when the user mentions "only on", "if", "while", "between". Supported:
44
+ state: {"condition":"state","entity_id":"<id>","state":"<value>"}
45
+ numeric_state: {"condition":"numeric_state","entity_id":"<id>","above":N,"below":N}
46
+ time: {"condition":"time","after":"HH:MM:SS","before":"HH:MM:SS","weekday":["mon",...]}
47
+ sun: {"condition":"sun","after":"sunset","before":"sunrise"}
48
+ - Use entity_ids ONLY from AVAILABLE ENTITIES — never invent placeholder names (e.g. "samsung_smart_light_kitchen") or YAML examples for the user to fill in.
49
+ - If the user's request is vague ("make an automation", "something useful"), pick a sensible automation grounded in AVAILABLE ENTITIES (e.g. lights at sunset, lock front door at bedtime, presence-based hallway light, notify on motion) and emit the full automation JSON.
50
+ - Do NOT duplicate anything in EXISTING AUTOMATIONS.
51
+ - description field MUST be 2-3 sentences explaining what the automation does, what triggers it, and when it fires. List all targeted entities so the user can verify before enabling.
52
+ - The `response` field MUST be 2-3 sentences of natural prose explaining what the automation does. Use **markdown bold** for key entity friendly_names and trigger states (e.g., **Coffee Maker Plug**, **failed**). Embed entity tile markers [[entity:<entity_id>|<friendly_name>]] when first naming a specific device — the panel renders these as live status cards. End with ONE follow-up question offering to enhance the automation (e.g., "Want me to also send a mobile push notification?", "Should I dim the lights at the same time?", "Want me to skip weekends?").
53
+ - Output ONLY the JSON object.
54
+
55
+ EXAMPLES:
56
+
57
+ User: Turn on kitchen lights at sunset every day.
58
+ AVAILABLE ENTITIES: light.kitchen (friendly_name="Kitchen Lights"); sensor.sun (friendly_name="Sun")
59
+ EXISTING AUTOMATIONS: (none)
60
+ Selora AI: {"intent":"automation","response":"This automation turns on [[entity:light.kitchen|Kitchen Lights]] every day at **sunset**. The lights will come on automatically as soon as the sun sets, so you walk into a lit kitchen at dusk. Want me to also dim them to 50% instead of full brightness?","description":"Turns on Kitchen Lights every day at sunset using the sun trigger. Fires whenever the sun event 'sunset' occurs, regardless of season. Targets entity light.kitchen.","automation":{"alias":"Kitchen Lights Sunset","description":"Turns on Kitchen Lights at sunset every day.","triggers":[{"trigger":"sun","event":"sunset"}],"conditions":[],"actions":[{"service":"light.turn_on","target":{"entity_id":"light.kitchen"},"data":{}}]}}
61
+
62
+ User: If the front door is unlocked after midnight, send me a notification.
63
+ AVAILABLE ENTITIES: lock.front_door (friendly_name="Front Door Lock"); notify.mobile_app (friendly_name="Mobile Push")
64
+ EXISTING AUTOMATIONS: (none)
65
+ Selora AI: {"intent":"automation","response":"This automation watches [[entity:lock.front_door|Front Door Lock]] and sends a **Mobile Push** notification whenever it changes to **unlocked** between midnight and 6 AM. It only fires during those overnight hours so it doesn't spam you during the day. Want me to also flash the entryway light when this happens?","description":"Sends a mobile push when the Front Door Lock state changes to 'unlocked' during overnight hours (00:00-06:00). Triggers on state change; only acts inside the time window. Targets lock.front_door for the trigger and notify.mobile_app for the action.","automation":{"alias":"Late Night Unlock Alert","description":"Notify mobile if front door unlocked between midnight and 6 AM.","triggers":[{"trigger":"state","entity_id":"lock.front_door","to":"unlocked"}],"conditions":[{"condition":"time","after":"00:00:00","before":"06:00:00"}],"actions":[{"service":"notify.mobile_app","data":{"message":"Front Door Lock unlocked overnight."}}]}}
66
+
67
+ User: Make me something useful.
68
+ AVAILABLE ENTITIES: light.kitchen (friendly_name="Kitchen Lights"); light.living_room (friendly_name="Living Room Lights"); lock.front_door (friendly_name="Front Door Lock"); sensor.sun (friendly_name="Sun")
69
+ EXISTING AUTOMATIONS: (none)
70
+ Selora AI: {"intent":"automation","response":"Here's a useful one: at sunset, this turns on [[entity:light.kitchen|Kitchen Lights]] and [[entity:light.living_room|Living Room Lights]], and locks [[entity:lock.front_door|Front Door Lock]] for the evening. It runs every day automatically so you don't have to think about it. Want me to also adjust the thermostat at the same time?","description":"Runs every day at sunset. Turns on Kitchen Lights and Living Room Lights, then locks the Front Door Lock. Targets light.kitchen, light.living_room, and lock.front_door.","automation":{"alias":"Evening Routine","description":"At sunset: lights on, front door locked.","triggers":[{"trigger":"sun","event":"sunset"}],"conditions":[],"actions":[{"service":"light.turn_on","target":{"entity_id":["light.kitchen","light.living_room"]},"data":{}},{"service":"lock.lock","target":{"entity_id":"lock.front_door"},"data":{}}]}}"""
71
+
72
+ # Generation params match training-time + integration defaults. repeat_penalty
73
+ # is 1.0 (not 1.15) because v0.4.7 was trained without it; 1.15 measurably
74
+ # distorts the slim-schema outputs (drops short keys like "r").
75
+ PARAMETER temperature 0.0
76
+ PARAMETER repeat_penalty 1.0
77
+ PARAMETER repeat_last_n 256
78
+ PARAMETER stop "<|im_end|>"
79
+ PARAMETER stop "<|endoftext|>"
Modelfile.clarification ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Ollama Modelfile for Selora-AI v0.4.7 / clarification specialist (Qwen3 1.7B + LoRA)
2
+ # Build: ollama create selora-qwen-clarification -f Modelfile.clarification
3
+ # Run: ollama run selora-qwen-clarification
4
+
5
+ FROM ./qwen3_17b_base.f16.gguf
6
+ ADAPTER ./selora-v047-clarification.f16.gguf
7
+
8
+ # Qwen3 ChatML template (/no_think suppresses the thinking block — matches the
9
+ # integration's enable_thinking=False chat-template-kwarg and how v0.4.7 was
10
+ # trained).
11
+ TEMPLATE """{{ if .System }}<|im_start|>system
12
+ {{ .System }}<|im_end|>
13
+ {{ end }}{{ if .Prompt }}<|im_start|>user
14
+ /no_think {{ .Prompt }}<|im_end|>
15
+ {{ end }}<|im_start|>assistant
16
+ """
17
+
18
+ # v0.4.7 per-specialist system prompt (verbatim from prompts/clarification_system_prompt.txt).
19
+ SYSTEM """You are Selora AI's clarification specialist for Home Assistant.
20
+
21
+ When the user's request is ambiguous, respond with ONE JSON object only:
22
+ {"q":"<question text>","o":["<option1>","<option2>",...]}
23
+
24
+ Rules:
25
+ - q: short, specific clarifying question. 1 sentence max.
26
+ - o: optional array of suggested answers. Omit the o key when free-form input is appropriate.
27
+ - Reference entity aliases from AVAILABLE ENTITIES when the ambiguity is about which entity.
28
+ - Don't ask multiple questions in one turn — pick the single most important blocker.
29
+ - Don't restate the user's full request; ask the one thing you need.
30
+
31
+ Output JSON only — no narration, no markdown fences, no chain-of-thought."""
32
+
33
+ # Generation params match training-time + integration defaults. repeat_penalty
34
+ # is 1.0 (not 1.15) because v0.4.7 was trained without it; 1.15 measurably
35
+ # distorts the slim-schema outputs (drops short keys like "r").
36
+ PARAMETER temperature 0.0
37
+ PARAMETER repeat_penalty 1.0
38
+ PARAMETER repeat_last_n 256
39
+ PARAMETER stop "<|im_end|>"
40
+ PARAMETER stop "<|endoftext|>"
Modelfile.command ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Ollama Modelfile for Selora-AI v0.4.7 / command specialist (Qwen3 1.7B + LoRA)
2
+ # Build: ollama create selora-qwen-command -f Modelfile.command
3
+ # Run: ollama run selora-qwen-command
4
+
5
+ FROM ./qwen3_17b_base.f16.gguf
6
+ ADAPTER ./selora-v047-command.f16.gguf
7
+
8
+ # Qwen3 ChatML template (/no_think suppresses the thinking block — matches the
9
+ # integration's enable_thinking=False chat-template-kwarg and how v0.4.7 was
10
+ # trained).
11
+ TEMPLATE """{{ if .System }}<|im_start|>system
12
+ {{ .System }}<|im_end|>
13
+ {{ end }}{{ if .Prompt }}<|im_start|>user
14
+ /no_think {{ .Prompt }}<|im_end|>
15
+ {{ end }}<|im_start|>assistant
16
+ """
17
+
18
+ # v0.4.7 per-specialist system prompt (verbatim from prompts/command_system_prompt.txt).
19
+ SYSTEM """You are Selora AI's command specialist for Home Assistant.
20
+
21
+ Given a user command and the AVAILABLE ENTITIES list, respond with ONE JSON object only:
22
+ {"c":[{"s":"<service>","e":"<entity_id>","d":{<optional params>}}],"r":"<short confirmation>"}
23
+
24
+ Rules:
25
+ - c: ordered array of one or more service calls. Calls execute in array order.
26
+ - s: HA service in "domain.action" form (e.g. "light.turn_on", "lock.lock", "media_player.play_media", "scene.turn_on").
27
+ - e: canonical entity_id from AVAILABLE ENTITIES. Never use the human alias — always the entity_id.
28
+ - d: service parameters object. Omit the d key entirely when there are no params (do not include "d":{}).
29
+ - r: ≤ 1 sentence past-tense confirmation describing what got done (e.g. "Kitchen light on.").
30
+ - The service domain (before the dot) must match the entity_id's domain. light.turn_on goes with light.* entities, lock.lock goes with lock.* entities, etc.
31
+ - For multi-target requests, produce one c entry per (service, entity_id) pair.
32
+
33
+ Output JSON only — no narration, no markdown fences, no chain-of-thought."""
34
+
35
+ # Generation params match training-time + integration defaults. repeat_penalty
36
+ # is 1.0 (not 1.15) because v0.4.7 was trained without it; 1.15 measurably
37
+ # distorts the slim-schema outputs (drops short keys like "r").
38
+ PARAMETER temperature 0.0
39
+ PARAMETER repeat_penalty 1.0
40
+ PARAMETER repeat_last_n 256
41
+ PARAMETER stop "<|im_end|>"
42
+ PARAMETER stop "<|endoftext|>"
README.md CHANGED
@@ -1,173 +1,146 @@
1
- ---
2
- license: apache-2.0
3
- base_model: Qwen/Qwen3-1.7B
4
- tags:
5
- - text-generation
6
- - qwen
7
- - qwen3
8
- - lora
9
- - home-assistant
10
- - home-automation
11
- - smart-home
12
- - iot
13
- - instruction-tuned
14
- - tool-use
15
- language:
16
- - en
17
- library_name: transformers
18
- pipeline_tag: text-generation
19
  ---
20
 
21
- # Selora AI
22
-
23
- Selora AI is an instruction-tuned [Qwen3 1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) model purpose-built for
24
- [**Home Assistant**](https://www.home-assistant.io/), the open-source **smart home**
25
- platform. Four specialist LoRA adapters cover device control, **home automation**
26
- authoring, Q&A, and clarification — each with its own trained system prompt and
27
- output shape. The `answer` adapter also emits a `query_state` tool envelope for
28
- live device-state queries against the Home Assistant REST API.
29
-
30
- Selora AI powers the [Selora AI Home Assistant integration](https://gitlab.com/selorahomes/products/selora-ai/ha-integration) and runs locally on Apple Silicon, Linux, or Windows via
31
- [Ollama](#ollama) or [llama.cpp](#llamacpp), or in the cloud via [vLLM](#vllm-cloud). It targets self-hosted **IoT** deployments where users want their
32
- home automation assistant to stay private and offline-first.
33
-
34
- ## Use cases
35
-
36
- - **Voice and chat control of smart-home devices** — "turn off the kitchen
37
- lights", "set the thermostat to 68", "open the garage door" — resolved against
38
- live Home Assistant entity state.
39
- - **Natural-language home automation creation** — describe an automation in
40
- plain English ("when the front door opens after 10pm, turn on the porch
41
- light") and Selora returns valid Home Assistant YAML with a risk assessment
42
- for review before deployment.
43
- - **Scene and routine orchestration** — chain actions across multiple entities
44
- ("good night" → lock doors, dim bedroom lights, set thermostat) without
45
- hand-writing scripts.
46
- - **Q&A about your home** — "is the laundry running?", "what's the temperature
47
- upstairs?" — answered via a `query_state` tool call against the HA REST API.
48
- - **Privacy-first home assistant** — runs entirely on local hardware
49
- (Raspberry Pi 5, Mac mini, NUC-class boxes) with no cloud dependency, so
50
- device commands and home telemetry never leave the LAN.
51
-
52
- ## Specialists
53
-
54
- | Adapter | Intent | Output shape |
55
- | --- | --- | --- |
56
- | `command` | "Turn off the kitchen lights" | `{intent:"command",response,calls:[…]}` |
57
- | `automation` | "Wake up lights at 6:30 AM" | `{intent:"automation",automation:{triggers,actions,…}}` |
58
- | `answer` | Q&A / small talk | `{intent:"answer",response}` |
59
- | `clarification` | Ask the user a follow-up | `{intent:"clarification",response}` |
60
-
61
- The HA integration's `selora_local` provider classifies each request to
62
- one of the four specialists before the call (cheap regex
63
- pre-classifier), then sends the request with `model:
64
- selora-v1-{specialist}`. Backends that support multi-LoRA
65
- (llama-server's `/lora-adapters`, vLLM `--enable-lora`) activate the
66
- matching adapter.
67
-
68
- ## Quick start
69
-
70
- ### Ollama
71
 
72
- ```bash
73
- ollama pull selora/commands
74
- ollama run selora/commands
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  ```
76
 
77
- Modelfiles for all four specialists live in [`ollama/`](ollama/) and
78
- are also published as separate Ollama models.
79
 
80
- ### llama.cpp
81
 
82
- ```bash
83
- llama-server \
84
- --model qwen3_17b_base.Q4_K_M.gguf \
85
- --lora-init-without-apply \
86
- --lora qwen3_17b_command.lora.gguf \
87
- --lora qwen3_17b_automation.lora.gguf \
88
- --lora qwen3_17b_answer.lora.gguf \
89
- --lora qwen3_17b_clarification.lora.gguf \
90
- --ctx-size 8192
 
 
 
 
 
 
 
 
 
 
 
91
  ```
92
 
93
- POST to `/lora-adapters` to switch the active LoRA before each
94
- `/v1/chat/completions` call.
95
 
96
- ### vLLM (cloud)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
 
98
  ```bash
99
- python -m vllm.entrypoints.openai.api_server \
100
- --model ./qwen3_17b_hf \
101
- --enable-lora --max-loras 4 --max-lora-rank 32 \
102
- --lora-modules \
103
- selora-v1-commands=/path/to/peft/command \
104
- selora-v1-automations=/path/to/peft/automation \
105
- selora-v1-answers=/path/to/peft/answer \
106
- selora-v1-clarifications=/path/to/peft/clarification
107
  ```
108
 
109
- vLLM activates the matching LoRA based on the request's `model` field;
110
- no extra routing layer needed.
111
 
112
- ## Generation parameters
113
 
114
- ```json
115
- {
116
- "temperature": 0.0,
117
- "repeat_penalty": 1.15,
118
- "repeat_last_n": 256,
119
- "max_tokens": 384,
120
- "stop": ["<|im_end|>", "<|endoftext|>"]
121
- }
122
  ```
123
 
124
- Bump `max_tokens` to 1536 for automation requests (longer JSON output).
125
-
126
- ## Training
127
-
128
- Base: [Qwen3 1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) fine-tuned
129
- with [Apple mlx-lm](https://github.com/ml-explore/mlx-examples). Each
130
- specialist has its own LoRA (rank 8–28, scale 20) trained on a curated
131
- HA-domain corpus (forum threads, HA docs, synthetic command /
132
- automation pairs). System prompts trained per-specialist; see
133
- [`prompts/`](prompts/). The `answer` adapter went through a sequential
134
- continuation pass that added a `query_state` tool envelope on top of
135
- the original answer-only training distribution; that's preserved in
136
- the augmented `prompts/answers.txt` and the `Modelfile.answers` SYSTEM
137
- block.
138
-
139
- ## Evaluation
140
-
141
- 10/10 parity pass rate on the four-intent suite (command, automation,
142
- answer, clarification — plus screenshot regressions). Validator and
143
- scenarios live in [`parity/`](parity/).
144
-
145
- ## Files in this bundle
146
-
147
- | Artifact | Purpose | Distribution |
148
- | --- | --- | --- |
149
- | `qwen3_17b_base.IQ4_XS.gguf` | Quantized base for Ollama / llama.cpp | Hugging Face, ollama.com |
150
- | `qwen3_17b_{intent}.lora.gguf` (×4) | Specialist LoRA adapters | Hugging Face, ollama.com |
151
- | `Modelfile.{intent}` (×4) | Ollama recipes (base + LoRA + system prompt) | this repo, ollama.com |
152
- | `prompts/{intent}.txt` (×4) | Plain-text trained prompts (reference / testing) | this repo |
153
-
154
- The full-precision (f16) base and HF safetensors set used by vLLM /
155
- TGI / SageMaker live separately in the cloud bundle and are not yet
156
- mirrored to Hugging Face.
157
-
158
- ## Citation
159
-
160
- ```bibtex
161
- @misc{selora-ai-2026,
162
- title = {Selora AI: Qwen3 1.7B + LoRA Specialists for Home Assistant},
163
- author = {{Selora Homes}},
164
- year = {2026},
165
- url = {https://huggingface.co/selora-homes/selora-ai}
166
- }
167
  ```
168
 
169
- Base model citation: Qwen Team, *Qwen3 Technical Report* (2025).
 
 
 
 
 
 
 
 
170
 
171
- ## License
172
 
173
- Apache-2.0 (matches the Qwen3 base license).
 
 
 
 
1
+ **Artifacts:** [Selora-AI-Local Package Registry — package 60760758](https://gitlab.com/selorahomes/products/selora-ai/models/-/packages/60760758)
2
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
 
5
+ # SeloraAI v0.4.7 — Local
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
+ **Base:** [Qwen/Qwen3-1.7B-Instruct](https://huggingface.co/Qwen/Qwen3-1.7B)
8
+ **Format:** GGUF **F16** base (~3.8 GB, no quantization) + 4 per-specialist LoRA adapters (F16)
9
+ **Architecture:** 4 LoRA specialists routed by a runtime classifier
10
+ **License:** Apache-2.0 (inherited from Qwen3)
11
+
12
+ ## Specialists (4 LoRAs)
13
+
14
+ | Slot | Specialist | Iters | v0.4.3 final val | v0.4.7 final val | Δ |
15
+ |---:|---|---:|---:|---:|---:|
16
+ | 0 | command | 800 | 0.016 | **0.012** | -25% |
17
+ | 1 | automation | 900 | 0.005 | **0.004** | -20% |
18
+ | 2 | answer | 600 | 0.014 | **0.011** | -21% |
19
+ | 3 | clarification | 400 | 0.015 | **0.011** | -27% |
20
+
21
+ Every specialist beats its v0.4.3 final val by 20-27%, and the lower starting val on v0.4.7 (e.g. answer 3.58 → 3.01 at iter 1) reflects cleaner training data — the entity-block reconciliation and the negation-answer reshape mean the base model sees less "surprise" structure on the first eval pass.
22
+
23
+ ## What changed since v0.4.3
24
+
25
+ ### Recipe specialist dropped from the bundle
26
+
27
+ Recipe handling moves to a deterministic pipeline outside the model. The bundle is smaller (4 LoRAs instead of 5, ~120 MB → ~82 MB of LoRAs) and inference doesn't pay the recipe specialist's load cost. Consumer-side intent classifiers should map "install / set up / recipe" requests to the pipeline path, not to a model specialist.
28
+
29
+ ### Entity-block format reconciled with the integration
30
+
31
+ `format_entities_block` in `scripts/gen_utils.py` now emits the exact per-line shape produced by `_format_entity_line` in `custom_components/selora_ai/llm_client/sanitize.py`:
32
+
33
+ ```
34
+ AVAILABLE ENTITIES:
35
+ - entity_id=light.kitchen; state=off; friendly_name=Kitchen Lights
36
+ - entity_id=sensor.sun; state=below_horizon; friendly_name=Sun
37
  ```
38
 
39
+ Whitelisted attribute keys (`device_class`, `unit_of_measurement`, `percentage`, `current_temperature`, `target_temperature`, `brightness`) are appended in the same order on both sides. This eliminates the train-vs-inference drift that previously sent the model out-of-distribution on entity-context blocks.
 
40
 
41
+ ### _negation_answer multi-turn reshape
42
 
43
+ `gen_multiturn._negation_answer` was a two-turn chain whose first assistant turn was a command envelope (`{"c":[...],"r":"…want me to also …?"}`) and whose second was the answer (`{"r":"OK, no problem."}`). Routing the whole chain to the answer specialist's training set meant mlx-lm trained on every assistant turn in the example — so the answer LoRA's gradient was reinforced on the command envelope and learned to occasionally emit `c` arrays.
44
+
45
+ The reshape folds the prior command turn into the user message as plain-English context, and only the final answer envelope reaches the LoRA's gradient:
46
+
47
+ Before (one example, 5 messages):
48
+ ```
49
+ [system: answer prompt]
50
+ [user: "turn off the yard light"]
51
+ [assistant: {"c":[{"s":"light.turn_off","e":"light.yard"}],
52
+ "r":"Yard light off. Want me to turn off the kid bath light too?"}]
53
+ [user: "no thanks"]
54
+ [assistant: {"r":"OK, no problem."}]
55
+ ```
56
+
57
+ After (one example, 3 messages — same conversational context):
58
+ ```
59
+ [system: answer prompt]
60
+ [user: "earlier you turned off the yard light and asked if I
61
+ wanted the kid bath light off too — no thanks"]
62
+ [assistant: {"r":"OK, no problem."}]
63
  ```
64
 
65
+ Multi-turn awareness at inference is unchanged — the integration still feeds prior conversation history via `_SELORA_LOCAL_HISTORY_TURNS=3`. The cleaning was on the training-data side only.
 
66
 
67
+ ### Pre-training audit script
68
+
69
+ `tools/audit.py` runs 22-29 checks before training (tools/generators/prompts/configs import cleanly, cross-layer specialist lists agree as sets, prompts are ASCII-safe, token-length p99 within the 4096 budget). Catches drift early instead of finding it after a training run.
70
+
71
+ ## Training notes
72
+
73
+ - Framework: **mlx-lm** on M5 Max (64 GB unified memory)
74
+ - Optimizer: adam, learning rate 1e-4, batch size 4, max-seq-length 4096, grad-checkpoint enabled
75
+ - Data: synthetic corpus generated by `v0.4.7/scripts/gen_*.py` from 10 curated home specs + procedural variants. ~45,694 examples total (post-filter, dropped 1,036 over the 14k-char threshold): command 16,081 / automation 14,928 / answer 11,385 / clarification 3,300.
76
+ - Slot order is the **API contract**: 0=command, 1=automation, 2=answer, 3=clarification. Recipe is no longer a slot.
77
+ - LoRA ranks set via per-specialist YAML configs at `v0.4.7/training/configs/<spec>.yaml`.
78
+
79
+ ## Files in this package
80
+
81
+ - `qwen3_17b_base.f16.gguf` (~3.8 GB) — F16 base, no quantization (keeps LoRA evaluation parity with training)
82
+ - 4 × `selora-v047-<specialist>.f16.gguf` (9.5 MB – 36 MB each, ~82 MB total)
83
+ - `manifest.json` — SHA-256 sums, sizes, slot order, training metadata
84
+ - 4 × `<specialist>_system_prompt.txt` — verbatim system prompts that produced the LoRAs
85
+ - 4 × `Modelfile.<specialist>` — optional [Ollama](https://ollama.com) Modelfiles (see "Running with Ollama" below)
86
+
87
+ ## Running with Ollama (optional)
88
+
89
+ If you have [Ollama](https://ollama.com) installed, you can use the same base + LoRA artifacts through Ollama instead of llama.cpp. The Modelfiles in this package bake in the ChatML template (with `/no_think`), the per-specialist system prompt, and the generation parameters used at training time — so the only thing you need to do is build each specialist once and run it.
90
+
91
+ llama.cpp remains the reference runtime; Ollama is offered as a convenience for users who already have it on their system.
92
+
93
+ ### Installing Ollama
94
+
95
+ These specialists require **Ollama 0.30 or later, installed via the official installer**. Two reasons to be specific:
96
+
97
+ - **Pre-0.30 Ollama lacks LoRA inference support** — `ollama run` fails with `"failed to initialize model: loras are not yet implemented"`. The build (`ollama create`) succeeds; only inference is blocked. Upgrade if you're on an older version.
98
+ - **`brew install ollama` is currently broken for all GGUF models on 0.30+.** The homebrew formula's `go build` does not produce the new `llama-server` binary that Ollama 0.30 split out for GGUF execution, so any GGUF model — including these — fails with `"error starting llama-server: llama-server binary not found"`. Tracking: [homebrew-core#285917](https://github.com/Homebrew/homebrew-core/issues/285917); fix queued in [PR #285963](https://github.com/Homebrew/homebrew-core/pull/285963). Until that lands, use the official installer.
99
+
100
+ Install via one of:
101
 
102
  ```bash
103
+ # macOS / Linux: one-line installer
104
+ curl -fsSL https://ollama.com/install.sh | sh
105
+
106
+ # macOS app bundle
107
+ # https://ollama.com/download
 
 
 
108
  ```
109
 
110
+ ### Build
 
111
 
112
+ Download the base, all four LoRA `.f16.gguf` files, and the four `Modelfile.*` files into a single directory, then from that directory:
113
 
114
+ ```bash
115
+ ollama create selora-qwen-command -f Modelfile.command
116
+ ollama create selora-qwen-automation -f Modelfile.automation
117
+ ollama create selora-qwen-answer -f Modelfile.answer
118
+ ollama create selora-qwen-clarification -f Modelfile.clarification
 
 
 
119
  ```
120
 
121
+ The base GGUF (`qwen3_17b_base.f16.gguf`) is stored once in Ollama's blob store and shared across all four specialists; only the ~10–37 MB LoRA adapter is added per slot.
122
+
123
+ ### Run
124
+
125
+ ```bash
126
+ ollama run selora-qwen-command
127
+ # >>> turn on the kitchen light
128
+ # {"c":[{"s":"light.turn_on","e":"light.kitchen"}],"r":"Kitchen light on."}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
  ```
130
 
131
+ Or hit the Ollama HTTP API directly the template and system prompt are already attached:
132
+
133
+ ```bash
134
+ curl http://localhost:11434/api/generate -d '{
135
+ "model": "selora-qwen-command",
136
+ "prompt": "AVAILABLE ENTITIES:\n - entity_id=light.kitchen; state=off; friendly_name=Kitchen Lights\n\nturn on the kitchen light",
137
+ "stream": false
138
+ }'
139
+ ```
140
 
141
+ ### Notes
142
 
143
+ - Pick the specialist that matches the intent: `command` for direct device control, `automation` for "create an automation that…", `answer` for state questions, `clarification` when input is ambiguous. In a full HA integration the runtime classifier picks for you — when calling Ollama directly, you choose by picking which model to run.
144
+ - Keep `repeat_penalty` at `1.0`. Raising it (Ollama's default elsewhere is 1.1) measurably distorts the slim-schema output and can drop short keys like `r`.
145
+ - The user message must include an `AVAILABLE ENTITIES:` block in the exact format above — these LoRAs were trained on that shape and go out-of-distribution without it.
146
+ - Minimum Ollama version: **0.30** (LoRA inference). See "Installing Ollama" above for installer guidance.
manifest.json CHANGED
@@ -1,54 +1,50 @@
1
  {
2
  "name": "selora-ai-local",
3
  "version": "0.4.7",
4
- "description": "Selora AI v0.4.7 \u2014 Qwen3-1.7B Q6_K base + 4 LoRA specialists. Hub-optimal base quant (fits Vega 8 VRAM, no GTT spill). Specialists retrained on a compacted-JSON corpus (automation rank 32); command specialist prompt updated, other three unchanged from v0.4.6.",
5
  "base_model": {
6
  "id": "Qwen/Qwen3-1.7B",
7
  "format": "gguf",
8
- "dtype": "Q6_K",
9
- "filename": "qwen3_17b_base.Q6_K.gguf",
10
- "size_bytes": 1673006880,
11
- "sha256": "a00bbdb411872149d73e1a0683b9b8a9f13cf74f98ba70ff8e8e430d9a093179"
12
  },
13
  "loras": [
14
  {
15
  "slot": 0,
16
  "name": "command",
17
- "rank": 16,
18
  "filename": "selora-v047-command.f16.gguf",
19
  "size_bytes": 19938528,
20
- "sha256": "49ba46bc054259409f5cd52eb3d9971101ed858fba27c6678c89e145815af515"
21
  },
22
  {
23
  "slot": 1,
24
  "name": "automation",
25
- "rank": 32,
26
  "filename": "selora-v047-automation.f16.gguf",
27
- "size_bytes": 59791968,
28
- "sha256": "32e5633f0e554fec5e336d993647065f85a4fffd348366cdc3417ca34824b527"
29
  },
30
  {
31
  "slot": 2,
32
  "name": "answer",
33
- "rank": 12,
34
  "filename": "selora-v047-answer.f16.gguf",
35
  "size_bytes": 14957792,
36
- "sha256": "7584a04e13dd3e4973bfeb89ed01b7b29b3288d768d7bee15bcea3de89ff50c7"
37
  },
38
  {
39
  "slot": 3,
40
  "name": "clarification",
41
- "rank": 8,
42
  "filename": "selora-v047-clarification.f16.gguf",
43
  "size_bytes": 9977056,
44
- "sha256": "75c06c0efbd9ca1e7108ec6ec5811f68e2dc5639d01ef583148a151b11398847"
45
  }
46
  ],
47
  "system_prompts": {
48
  "command": {
49
  "filename": "command_system_prompt.txt",
50
- "size_bytes": 1374,
51
- "sha256": "0fa2b1669dedca18ddba2cebd7f72cd7cff0f7431b87ca4166a4bf60c7aed697"
52
  },
53
  "automation": {
54
  "filename": "automation_system_prompt.txt",
@@ -68,29 +64,28 @@
68
  },
69
  "runtime": {
70
  "cache_prompt": true,
71
- "ctx_size": 8192
72
  },
73
  "training": {
74
  "framework": "mlx-lm",
75
  "base_model_repo": "Qwen/Qwen3-1.7B",
76
- "scale": 20.0,
77
- "rank_per_specialist": {
78
- "command": 16,
79
- "automation": 32,
80
- "answer": 12,
81
- "clarification": 8
82
- },
83
  "iterations_per_specialist": {
84
- "command": 800,
85
- "automation": 1800,
86
  "answer": 600,
87
- "clarification": 450
 
88
  },
89
  "examples_per_specialist": {
90
- "command": 11000,
91
- "automation": 10000,
92
- "answer": 5500,
93
- "clarification": 3000
94
  }
95
  }
96
  }
 
1
  {
2
  "name": "selora-ai-local",
3
  "version": "0.4.7",
4
+ "description": "Selora AI v0.4.7 \u2014 Qwen3-1.7B F16 base + 4 LoRA specialists with slim action-then-confirm output schemas. Multi-state entity context (per-entity attribute tails in AVAILABLE ENTITIES) for richer single-turn grounding. Inference: cache_prompt enabled to amortize system-prompt KV cache across requests.",
5
  "base_model": {
6
  "id": "Qwen/Qwen3-1.7B",
7
  "format": "gguf",
8
+ "dtype": "F16",
9
+ "filename": "qwen3_17b_base.f16.gguf",
10
+ "size_bytes": 4069678880,
11
+ "sha256": "3e4009f0d96955a45f29aa77bded839d376d7832823c6909f76c84ace81dc445"
12
  },
13
  "loras": [
14
  {
15
  "slot": 0,
16
  "name": "command",
 
17
  "filename": "selora-v047-command.f16.gguf",
18
  "size_bytes": 19938528,
19
+ "sha256": "b10b5131157698400ee9fafac08ab4101b81230a79ad067eb8f89bd4e29a6273"
20
  },
21
  {
22
  "slot": 1,
23
  "name": "automation",
 
24
  "filename": "selora-v047-automation.f16.gguf",
25
+ "size_bytes": 37374880,
26
+ "sha256": "1bdd8c54cb36332889498a67046f01b55de7e5ac019e350419ba98a9c104f78f"
27
  },
28
  {
29
  "slot": 2,
30
  "name": "answer",
 
31
  "filename": "selora-v047-answer.f16.gguf",
32
  "size_bytes": 14957792,
33
+ "sha256": "9ec14105e7119675a75c1e166c132298886f4a25e57628c4ef720fce2080171e"
34
  },
35
  {
36
  "slot": 3,
37
  "name": "clarification",
 
38
  "filename": "selora-v047-clarification.f16.gguf",
39
  "size_bytes": 9977056,
40
+ "sha256": "071558279b1bc8c8609fc63a1524604aa7fcf721c5bd6d3ecdf1f55ad16f5e1a"
41
  }
42
  ],
43
  "system_prompts": {
44
  "command": {
45
  "filename": "command_system_prompt.txt",
46
+ "size_bytes": 1071,
47
+ "sha256": "9921c6fef09c6ebad4a2ed4fad1dbe7e76efe0bfe4e532bf7c7fe096864de6a4"
48
  },
49
  "automation": {
50
  "filename": "automation_system_prompt.txt",
 
64
  },
65
  "runtime": {
66
  "cache_prompt": true,
67
+ "ctx_size": 4096
68
  },
69
  "training": {
70
  "framework": "mlx-lm",
71
  "base_model_repo": "Qwen/Qwen3-1.7B",
72
+ "optimizer": "adam",
73
+ "learning_rate": 0.0001,
74
+ "batch_size": 4,
75
+ "max_seq_length": 4096,
76
+ "english_only": true,
77
+ "data_source": "synthetic \u2014 slim schemas in slim_schemas.md, generated by scripts/gen_{intent}.py from 10 curated home specs + procedural variants; service_matrix.py covers 49 (domain, service) pairs. tools.home_specs.diversify_states() injects multi-state attributes per training example.",
 
78
  "iterations_per_specialist": {
79
+ "command": 750,
 
80
  "answer": 600,
81
+ "clarification": 450,
82
+ "automation": 1050
83
  },
84
  "examples_per_specialist": {
85
+ "command": 8800,
86
+ "answer": 6600,
87
+ "clarification": 3300,
88
+ "automation": 6600
89
  }
90
  }
91
  }
prompts/command_system_prompt.txt CHANGED
@@ -5,11 +5,11 @@ Given a user command and the AVAILABLE ENTITIES list, respond with ONE JSON obje
5
 
6
  Rules:
7
  - c: ordered array of one or more service calls. Calls execute in array order.
8
- - s: HA service in "domain.action" form. Only these domains are accepted for immediate execution: light, switch, fan, media_player, climate, input_boolean, scene, cover. Examples: "light.turn_on", "switch.turn_off", "media_player.media_play", "cover.open_cover", "scene.turn_on". Do not emit lock.*, alarm_*, script.*, notify.*, or any other domain — those are blocked by the safety policy and the call will fail.
9
  - e: canonical entity_id from AVAILABLE ENTITIES. Never use the human alias — always the entity_id.
10
  - d: service parameters object. Omit the d key entirely when there are no params (do not include "d":{}).
11
  - r: ≤ 1 sentence past-tense confirmation describing what got done (e.g. "Kitchen light on.").
12
- - The service domain (before the dot) must match the entity_id's domain. light.turn_on goes with light.* entities, cover.open_cover goes with cover.* entities, etc.
13
  - For multi-target requests, produce one c entry per (service, entity_id) pair.
14
 
15
  Output JSON only — no narration, no markdown fences, no chain-of-thought.
 
5
 
6
  Rules:
7
  - c: ordered array of one or more service calls. Calls execute in array order.
8
+ - s: HA service in "domain.action" form (e.g. "light.turn_on", "lock.lock", "media_player.play_media", "scene.turn_on").
9
  - e: canonical entity_id from AVAILABLE ENTITIES. Never use the human alias — always the entity_id.
10
  - d: service parameters object. Omit the d key entirely when there are no params (do not include "d":{}).
11
  - r: ≤ 1 sentence past-tense confirmation describing what got done (e.g. "Kitchen light on.").
12
+ - The service domain (before the dot) must match the entity_id's domain. light.turn_on goes with light.* entities, lock.lock goes with lock.* entities, etc.
13
  - For multi-target requests, produce one c entry per (service, entity_id) pair.
14
 
15
  Output JSON only — no narration, no markdown fences, no chain-of-thought.
qwen3_17b_base.f16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e4009f0d96955a45f29aa77bded839d376d7832823c6909f76c84ace81dc445
3
+ size 4069678880
selora-v047-answer.f16.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7584a04e13dd3e4973bfeb89ed01b7b29b3288d768d7bee15bcea3de89ff50c7
3
  size 14957792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ec14105e7119675a75c1e166c132298886f4a25e57628c4ef720fce2080171e
3
  size 14957792
selora-v047-automation.f16.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:32e5633f0e554fec5e336d993647065f85a4fffd348366cdc3417ca34824b527
3
- size 59791968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1bdd8c54cb36332889498a67046f01b55de7e5ac019e350419ba98a9c104f78f
3
+ size 37374880
selora-v047-clarification.f16.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:75c06c0efbd9ca1e7108ec6ec5811f68e2dc5639d01ef583148a151b11398847
3
  size 9977056
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:071558279b1bc8c8609fc63a1524604aa7fcf721c5bd6d3ecdf1f55ad16f5e1a
3
  size 9977056
selora-v047-command.f16.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:49ba46bc054259409f5cd52eb3d9971101ed858fba27c6678c89e145815af515
3
  size 19938528
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b10b5131157698400ee9fafac08ab4101b81230a79ad067eb8f89bd4e29a6273
3
  size 19938528