File size: 4,558 Bytes
b97879c
 
7ace5ac
b97879c
 
 
7ace5ac
b97879c
 
 
 
7ace5ac
b97879c
 
 
 
 
 
d994b26
b97879c
d994b26
 
 
 
 
b97879c
 
d994b26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b97879c
 
 
 
 
 
d994b26
 
b97879c
 
d994b26
 
b97879c
d994b26
b97879c
 
 
d994b26
b97879c
d994b26
 
 
 
 
b97879c
 
d994b26
 
 
 
b97879c
 
d994b26
 
 
 
 
 
 
 
b97879c
 
d994b26
 
2daf6ff
d994b26
 
 
 
 
 
 
 
 
 
 
b97879c
d994b26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b97879c
 
d994b26
b97879c
 
 
d994b26
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
license: apache-2.0
base_model: Qwen/Qwen3-1.7B
tags:
  - text-generation
  - qwen
  - qwen3
  - lora
  - home-assistant
  - home-automation
  - smart-home
  - tool-use
language:
  - en
library_name: transformers
pipeline_tag: text-generation
---

# Selora AI

Qwen3 1.7B fine-tuned for Home Assistant with four specialist LoRA
adapters. The `answer` adapter additionally emits a `query_state` tool
envelope for live device-state queries against the Home Assistant REST
API. Used by the [Selora AI Home Assistant
integration](https://gitlab.com/selorahomes/products/selora-ai/ha-integration);
also runnable directly via Ollama, llama.cpp, or vLLM.

## Specialists

| Adapter | Intent | Output shape |
| --- | --- | --- |
| `command` | "Turn off the kitchen lights" | `{intent:"command",response,calls:[…]}` |
| `automation` | "Wake up lights at 6:30 AM" | `{intent:"automation",automation:{triggers,actions,…}}` |
| `answer` | Q&A / small talk | `{intent:"answer",response}` |
| `clarification` | Ask the user a follow-up | `{intent:"clarification",response}` |

The HA integration's `selora_local` provider classifies each request to
one of the four specialists before the call (cheap regex
pre-classifier), then sends the request with `model:
selora-v1-{specialist}`. Backends that support multi-LoRA
(llama-server's `/lora-adapters`, vLLM `--enable-lora`) activate the
matching adapter.

## Quick start

### Ollama

```bash
ollama pull selora/commands
ollama run selora/commands
```

Modelfiles for all four specialists live in [`ollama/`](ollama/) and
are also published as separate Ollama models.

### llama.cpp

```bash
llama-server \
  --model qwen3_17b_base.Q4_K_M.gguf \
  --lora-init-without-apply \
  --lora qwen3_17b_command.lora.gguf \
  --lora qwen3_17b_automation.lora.gguf \
  --lora qwen3_17b_answer.lora.gguf \
  --lora qwen3_17b_clarification.lora.gguf \
  --ctx-size 8192
```

POST to `/lora-adapters` to switch the active LoRA before each
`/v1/chat/completions` call.

### vLLM (cloud)

```bash
python -m vllm.entrypoints.openai.api_server \
  --model ./qwen3_17b_hf \
  --enable-lora --max-loras 4 --max-lora-rank 32 \
  --lora-modules \
    selora-v1-commands=/path/to/peft/command \
    selora-v1-automations=/path/to/peft/automation \
    selora-v1-answers=/path/to/peft/answer \
    selora-v1-clarifications=/path/to/peft/clarification
```

vLLM activates the matching LoRA based on the request's `model` field;
no extra routing layer needed.

## Generation parameters

```json
{
  "temperature": 0.0,
  "repeat_penalty": 1.15,
  "repeat_last_n": 256,
  "max_tokens": 384,
  "stop": ["<|im_end|>", "<|endoftext|>"]
}
```

Bump `max_tokens` to 1536 for automation requests (longer JSON output).

## Training

Base: [Qwen3 1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) fine-tuned
with [Apple mlx-lm](https://github.com/ml-explore/mlx-examples). Each
specialist has its own LoRA (rank 8–28, scale 20) trained on a curated
HA-domain corpus (forum threads, HA docs, synthetic command /
automation pairs). System prompts trained per-specialist; see
[`prompts/`](prompts/). The `answer` adapter went through a sequential
continuation pass that added a `query_state` tool envelope on top of
the original answer-only training distribution; that's preserved in
the augmented `prompts/answers.txt` and the `Modelfile.answers` SYSTEM
block.

## Evaluation

10/10 parity pass rate on the four-intent suite (command, automation,
answer, clarification — plus screenshot regressions). Validator and
scenarios live in [`parity/`](parity/).

## Files in this bundle

| Artifact | Purpose | Distribution |
| --- | --- | --- |
| `qwen3_17b_base.IQ4_XS.gguf` | Quantized base for Ollama / llama.cpp | Hugging Face, ollama.com |
| `qwen3_17b_{intent}.lora.gguf` (×4) | Specialist LoRA adapters | Hugging Face, ollama.com |
| `Modelfile.{intent}` (×4) | Ollama recipes (base + LoRA + system prompt) | this repo, ollama.com |
| `prompts/{intent}.txt` (×4) | Plain-text trained prompts (reference / testing) | this repo |

The full-precision (f16) base and HF safetensors set used by vLLM /
TGI / SageMaker live separately in the cloud bundle and are not yet
mirrored to Hugging Face.

## Citation

```bibtex
@misc{selora-ai-2026,
  title  = {Selora AI: Qwen3 1.7B + LoRA Specialists for Home Assistant},
  author = {{Selora Homes}},
  year   = {2026},
  url    = {https://huggingface.co/selora-homes/selora-ai}
}
```

Base model citation: Qwen Team, *Qwen3 Technical Report* (2025).

## License

Apache-2.0 (matches the Qwen3 base license).