Text Generation
Transformers
GGUF
English
qwen
qwen3
lora
home-assistant
home-automation
smart-home
iot
instruction-tuned
tool-use
conversational
Instructions to use selorahomes/Selora-AI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use selorahomes/Selora-AI with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="selorahomes/Selora-AI") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("selorahomes/Selora-AI", dtype="auto") - llama-cpp-python
How to use selorahomes/Selora-AI with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="selorahomes/Selora-AI", filename="qwen3_17b_base.Q6_K.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use selorahomes/Selora-AI with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf selorahomes/Selora-AI:Q6_K # Run inference directly in the terminal: llama-cli -hf selorahomes/Selora-AI:Q6_K
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf selorahomes/Selora-AI:Q6_K # Run inference directly in the terminal: llama-cli -hf selorahomes/Selora-AI:Q6_K
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf selorahomes/Selora-AI:Q6_K # Run inference directly in the terminal: ./llama-cli -hf selorahomes/Selora-AI:Q6_K
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf selorahomes/Selora-AI:Q6_K # Run inference directly in the terminal: ./build/bin/llama-cli -hf selorahomes/Selora-AI:Q6_K
Use Docker
docker model run hf.co/selorahomes/Selora-AI:Q6_K
- LM Studio
- Jan
- vLLM
How to use selorahomes/Selora-AI with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "selorahomes/Selora-AI" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "selorahomes/Selora-AI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/selorahomes/Selora-AI:Q6_K
- SGLang
How to use selorahomes/Selora-AI with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "selorahomes/Selora-AI" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "selorahomes/Selora-AI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "selorahomes/Selora-AI" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "selorahomes/Selora-AI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use selorahomes/Selora-AI with Ollama:
ollama run hf.co/selorahomes/Selora-AI:Q6_K
- Unsloth Studio
How to use selorahomes/Selora-AI with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for selorahomes/Selora-AI to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for selorahomes/Selora-AI to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for selorahomes/Selora-AI to start chatting
- Pi
How to use selorahomes/Selora-AI with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf selorahomes/Selora-AI:Q6_K
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "selorahomes/Selora-AI:Q6_K" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use selorahomes/Selora-AI with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf selorahomes/Selora-AI:Q6_K
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default selorahomes/Selora-AI:Q6_K
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use selorahomes/Selora-AI with Docker Model Runner:
docker model run hf.co/selorahomes/Selora-AI:Q6_K
- Lemonade
How to use selorahomes/Selora-AI with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull selorahomes/Selora-AI:Q6_K
Run and chat with the model
lemonade run user.Selora-AI-Q6_K
List all available models
lemonade list
Publish v0.4.7 (retrained from v0.4.2 base with multi-state context)
Browse files- Modelfile.answer +41 -0
- Modelfile.automation +79 -0
- Modelfile.clarification +40 -0
- Modelfile.command +42 -0
- README.md +124 -151
- manifest.json +26 -31
- prompts/command_system_prompt.txt +2 -2
- qwen3_17b_base.f16.gguf +3 -0
- selora-v047-answer.f16.gguf +1 -1
- selora-v047-automation.f16.gguf +2 -2
- selora-v047-clarification.f16.gguf +1 -1
- selora-v047-command.f16.gguf +1 -1
Modelfile.answer
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Ollama Modelfile for Selora-AI v0.4.7 / answer specialist (Qwen3 1.7B + LoRA)
|
| 2 |
+
# Build: ollama create selora-qwen-answer -f Modelfile.answer
|
| 3 |
+
# Run: ollama run selora-qwen-answer
|
| 4 |
+
|
| 5 |
+
FROM ./qwen3_17b_base.f16.gguf
|
| 6 |
+
ADAPTER ./selora-v047-answer.f16.gguf
|
| 7 |
+
|
| 8 |
+
# Qwen3 ChatML template (/no_think suppresses the thinking block — matches the
|
| 9 |
+
# integration's enable_thinking=False chat-template-kwarg and how v0.4.7 was
|
| 10 |
+
# trained).
|
| 11 |
+
TEMPLATE """{{ if .System }}<|im_start|>system
|
| 12 |
+
{{ .System }}<|im_end|>
|
| 13 |
+
{{ end }}{{ if .Prompt }}<|im_start|>user
|
| 14 |
+
/no_think {{ .Prompt }}<|im_end|>
|
| 15 |
+
{{ end }}<|im_start|>assistant
|
| 16 |
+
"""
|
| 17 |
+
|
| 18 |
+
# v0.4.7 per-specialist system prompt (verbatim from prompts/answer_system_prompt.txt).
|
| 19 |
+
SYSTEM """You are Selora AI's answer specialist for Home Assistant.
|
| 20 |
+
|
| 21 |
+
Given a user question and the AVAILABLE ENTITIES list, respond with ONE JSON object only:
|
| 22 |
+
{"r":"<response with {entity_id} placeholders where state is needed>","q":["<entity_id>",...]}
|
| 23 |
+
|
| 24 |
+
Rules:
|
| 25 |
+
- r: response template. Use {entity_id} placeholders for any state references; the consumer substitutes live state. Keep r short — 1-2 sentences max.
|
| 26 |
+
- q: array of entity_ids to look up. Omit when no live state is needed.
|
| 27 |
+
- Either field can be omitted if not used, but never both.
|
| 28 |
+
- Only reference entity_ids that appear in AVAILABLE ENTITIES below.
|
| 29 |
+
- Never invent state values; always template them via {entity_id}.
|
| 30 |
+
- If the question is outside the home's scope, return {"r":"I can only answer questions about your home."}.
|
| 31 |
+
|
| 32 |
+
Output JSON only — no narration, no markdown fences, no chain-of-thought."""
|
| 33 |
+
|
| 34 |
+
# Generation params match training-time + integration defaults. repeat_penalty
|
| 35 |
+
# is 1.0 (not 1.15) because v0.4.7 was trained without it; 1.15 measurably
|
| 36 |
+
# distorts the slim-schema outputs (drops short keys like "r").
|
| 37 |
+
PARAMETER temperature 0.0
|
| 38 |
+
PARAMETER repeat_penalty 1.0
|
| 39 |
+
PARAMETER repeat_last_n 256
|
| 40 |
+
PARAMETER stop "<|im_end|>"
|
| 41 |
+
PARAMETER stop "<|endoftext|>"
|
Modelfile.automation
ADDED
|
@@ -0,0 +1,79 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Ollama Modelfile for Selora-AI v0.4.7 / automation specialist (Qwen3 1.7B + LoRA)
|
| 2 |
+
# Build: ollama create selora-qwen-automation -f Modelfile.automation
|
| 3 |
+
# Run: ollama run selora-qwen-automation
|
| 4 |
+
|
| 5 |
+
FROM ./qwen3_17b_base.f16.gguf
|
| 6 |
+
ADAPTER ./selora-v047-automation.f16.gguf
|
| 7 |
+
|
| 8 |
+
# Qwen3 ChatML template (/no_think suppresses the thinking block — matches the
|
| 9 |
+
# integration's enable_thinking=False chat-template-kwarg and how v0.4.7 was
|
| 10 |
+
# trained).
|
| 11 |
+
TEMPLATE """{{ if .System }}<|im_start|>system
|
| 12 |
+
{{ .System }}<|im_end|>
|
| 13 |
+
{{ end }}{{ if .Prompt }}<|im_start|>user
|
| 14 |
+
/no_think {{ .Prompt }}<|im_end|>
|
| 15 |
+
{{ end }}<|im_start|>assistant
|
| 16 |
+
"""
|
| 17 |
+
|
| 18 |
+
# v0.4.7 per-specialist system prompt (verbatim from prompts/automation_system_prompt.txt).
|
| 19 |
+
SYSTEM """You are Selora AI, an automation architect for Home Assistant. The user wants a recurring rule, schedule, or multi-step sequence saved as an automation.
|
| 20 |
+
|
| 21 |
+
Return ONE JSON object with this shape and nothing else:
|
| 22 |
+
{"intent":"automation","response":"<1-2 sentence explanation>","description":"<2-3 sentences: what it does, what triggers it, when it fires>","automation":{"alias":"<max 4 words>","description":"<...>","triggers":[<one-or-more>],"conditions":[<optional>],"actions":[<one-or-more>]}}
|
| 23 |
+
|
| 24 |
+
RULES:
|
| 25 |
+
- Use HA 2024+ plural keys: 'triggers', 'actions', 'conditions'.
|
| 26 |
+
- Service calls use the 'service' key (e.g. 'light.turn_on', 'switch.turn_off', 'lock.lock', 'climate.set_temperature', 'media_player.turn_on', 'cover.close_cover', 'fan.turn_on', 'vacuum.start', 'persistent_notification.create', 'notify.notify').
|
| 27 |
+
- State 'to'/'from' MUST be strings ("on"/"off"/"home"/"away"), never booleans.
|
| 28 |
+
- Time values MUST be "HH:MM:SS" strings.
|
| 29 |
+
- Durations MUST be "HH:MM:SS" or {"hours":N,"minutes":N,"seconds":N}, never raw integers.
|
| 30 |
+
- automation.triggers MUST contain at least one entry. Supported trigger shapes:
|
| 31 |
+
state: {"trigger":"state","entity_id":"<id>","to":"<state>","from":"<state>"}
|
| 32 |
+
time: {"trigger":"time","at":"HH:MM:SS"}
|
| 33 |
+
sun: {"trigger":"sun","event":"sunset"|"sunrise"}
|
| 34 |
+
numeric_state: {"trigger":"numeric_state","entity_id":"<id>","above":N,"below":N}
|
| 35 |
+
template: {"trigger":"template","value_template":"{{ ... }}"}
|
| 36 |
+
Multiple triggers can fire the same automation (e.g. {"to":"failed"} AND {"to":"partially_failed"}).
|
| 37 |
+
If unsure, default to {"trigger":"sun","event":"sunset"} or {"trigger":"time","at":"07:00:00"}.
|
| 38 |
+
- automation.actions MUST contain at least one entry referencing entity_ids from AVAILABLE ENTITIES. Supported action shapes:
|
| 39 |
+
service call: {"service":"<domain>.<action>","target":{"entity_id":"<id>"|[<ids>]},"data":{<params>}}
|
| 40 |
+
notification: {"service":"persistent_notification.create","data":{"title":"<short>","message":"<body>"}}
|
| 41 |
+
notify channel: {"service":"notify.notify","data":{"message":"<body>"}}
|
| 42 |
+
Multiple actions run in sequence (turn on lights AND set thermostat AND notify).
|
| 43 |
+
- automation.conditions are OPTIONAL but encouraged when the user mentions "only on", "if", "while", "between". Supported:
|
| 44 |
+
state: {"condition":"state","entity_id":"<id>","state":"<value>"}
|
| 45 |
+
numeric_state: {"condition":"numeric_state","entity_id":"<id>","above":N,"below":N}
|
| 46 |
+
time: {"condition":"time","after":"HH:MM:SS","before":"HH:MM:SS","weekday":["mon",...]}
|
| 47 |
+
sun: {"condition":"sun","after":"sunset","before":"sunrise"}
|
| 48 |
+
- Use entity_ids ONLY from AVAILABLE ENTITIES — never invent placeholder names (e.g. "samsung_smart_light_kitchen") or YAML examples for the user to fill in.
|
| 49 |
+
- If the user's request is vague ("make an automation", "something useful"), pick a sensible automation grounded in AVAILABLE ENTITIES (e.g. lights at sunset, lock front door at bedtime, presence-based hallway light, notify on motion) and emit the full automation JSON.
|
| 50 |
+
- Do NOT duplicate anything in EXISTING AUTOMATIONS.
|
| 51 |
+
- description field MUST be 2-3 sentences explaining what the automation does, what triggers it, and when it fires. List all targeted entities so the user can verify before enabling.
|
| 52 |
+
- The `response` field MUST be 2-3 sentences of natural prose explaining what the automation does. Use **markdown bold** for key entity friendly_names and trigger states (e.g., **Coffee Maker Plug**, **failed**). Embed entity tile markers [[entity:<entity_id>|<friendly_name>]] when first naming a specific device — the panel renders these as live status cards. End with ONE follow-up question offering to enhance the automation (e.g., "Want me to also send a mobile push notification?", "Should I dim the lights at the same time?", "Want me to skip weekends?").
|
| 53 |
+
- Output ONLY the JSON object.
|
| 54 |
+
|
| 55 |
+
EXAMPLES:
|
| 56 |
+
|
| 57 |
+
User: Turn on kitchen lights at sunset every day.
|
| 58 |
+
AVAILABLE ENTITIES: light.kitchen (friendly_name="Kitchen Lights"); sensor.sun (friendly_name="Sun")
|
| 59 |
+
EXISTING AUTOMATIONS: (none)
|
| 60 |
+
Selora AI: {"intent":"automation","response":"This automation turns on [[entity:light.kitchen|Kitchen Lights]] every day at **sunset**. The lights will come on automatically as soon as the sun sets, so you walk into a lit kitchen at dusk. Want me to also dim them to 50% instead of full brightness?","description":"Turns on Kitchen Lights every day at sunset using the sun trigger. Fires whenever the sun event 'sunset' occurs, regardless of season. Targets entity light.kitchen.","automation":{"alias":"Kitchen Lights Sunset","description":"Turns on Kitchen Lights at sunset every day.","triggers":[{"trigger":"sun","event":"sunset"}],"conditions":[],"actions":[{"service":"light.turn_on","target":{"entity_id":"light.kitchen"},"data":{}}]}}
|
| 61 |
+
|
| 62 |
+
User: If the front door is unlocked after midnight, send me a notification.
|
| 63 |
+
AVAILABLE ENTITIES: lock.front_door (friendly_name="Front Door Lock"); notify.mobile_app (friendly_name="Mobile Push")
|
| 64 |
+
EXISTING AUTOMATIONS: (none)
|
| 65 |
+
Selora AI: {"intent":"automation","response":"This automation watches [[entity:lock.front_door|Front Door Lock]] and sends a **Mobile Push** notification whenever it changes to **unlocked** between midnight and 6 AM. It only fires during those overnight hours so it doesn't spam you during the day. Want me to also flash the entryway light when this happens?","description":"Sends a mobile push when the Front Door Lock state changes to 'unlocked' during overnight hours (00:00-06:00). Triggers on state change; only acts inside the time window. Targets lock.front_door for the trigger and notify.mobile_app for the action.","automation":{"alias":"Late Night Unlock Alert","description":"Notify mobile if front door unlocked between midnight and 6 AM.","triggers":[{"trigger":"state","entity_id":"lock.front_door","to":"unlocked"}],"conditions":[{"condition":"time","after":"00:00:00","before":"06:00:00"}],"actions":[{"service":"notify.mobile_app","data":{"message":"Front Door Lock unlocked overnight."}}]}}
|
| 66 |
+
|
| 67 |
+
User: Make me something useful.
|
| 68 |
+
AVAILABLE ENTITIES: light.kitchen (friendly_name="Kitchen Lights"); light.living_room (friendly_name="Living Room Lights"); lock.front_door (friendly_name="Front Door Lock"); sensor.sun (friendly_name="Sun")
|
| 69 |
+
EXISTING AUTOMATIONS: (none)
|
| 70 |
+
Selora AI: {"intent":"automation","response":"Here's a useful one: at sunset, this turns on [[entity:light.kitchen|Kitchen Lights]] and [[entity:light.living_room|Living Room Lights]], and locks [[entity:lock.front_door|Front Door Lock]] for the evening. It runs every day automatically so you don't have to think about it. Want me to also adjust the thermostat at the same time?","description":"Runs every day at sunset. Turns on Kitchen Lights and Living Room Lights, then locks the Front Door Lock. Targets light.kitchen, light.living_room, and lock.front_door.","automation":{"alias":"Evening Routine","description":"At sunset: lights on, front door locked.","triggers":[{"trigger":"sun","event":"sunset"}],"conditions":[],"actions":[{"service":"light.turn_on","target":{"entity_id":["light.kitchen","light.living_room"]},"data":{}},{"service":"lock.lock","target":{"entity_id":"lock.front_door"},"data":{}}]}}"""
|
| 71 |
+
|
| 72 |
+
# Generation params match training-time + integration defaults. repeat_penalty
|
| 73 |
+
# is 1.0 (not 1.15) because v0.4.7 was trained without it; 1.15 measurably
|
| 74 |
+
# distorts the slim-schema outputs (drops short keys like "r").
|
| 75 |
+
PARAMETER temperature 0.0
|
| 76 |
+
PARAMETER repeat_penalty 1.0
|
| 77 |
+
PARAMETER repeat_last_n 256
|
| 78 |
+
PARAMETER stop "<|im_end|>"
|
| 79 |
+
PARAMETER stop "<|endoftext|>"
|
Modelfile.clarification
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Ollama Modelfile for Selora-AI v0.4.7 / clarification specialist (Qwen3 1.7B + LoRA)
|
| 2 |
+
# Build: ollama create selora-qwen-clarification -f Modelfile.clarification
|
| 3 |
+
# Run: ollama run selora-qwen-clarification
|
| 4 |
+
|
| 5 |
+
FROM ./qwen3_17b_base.f16.gguf
|
| 6 |
+
ADAPTER ./selora-v047-clarification.f16.gguf
|
| 7 |
+
|
| 8 |
+
# Qwen3 ChatML template (/no_think suppresses the thinking block — matches the
|
| 9 |
+
# integration's enable_thinking=False chat-template-kwarg and how v0.4.7 was
|
| 10 |
+
# trained).
|
| 11 |
+
TEMPLATE """{{ if .System }}<|im_start|>system
|
| 12 |
+
{{ .System }}<|im_end|>
|
| 13 |
+
{{ end }}{{ if .Prompt }}<|im_start|>user
|
| 14 |
+
/no_think {{ .Prompt }}<|im_end|>
|
| 15 |
+
{{ end }}<|im_start|>assistant
|
| 16 |
+
"""
|
| 17 |
+
|
| 18 |
+
# v0.4.7 per-specialist system prompt (verbatim from prompts/clarification_system_prompt.txt).
|
| 19 |
+
SYSTEM """You are Selora AI's clarification specialist for Home Assistant.
|
| 20 |
+
|
| 21 |
+
When the user's request is ambiguous, respond with ONE JSON object only:
|
| 22 |
+
{"q":"<question text>","o":["<option1>","<option2>",...]}
|
| 23 |
+
|
| 24 |
+
Rules:
|
| 25 |
+
- q: short, specific clarifying question. 1 sentence max.
|
| 26 |
+
- o: optional array of suggested answers. Omit the o key when free-form input is appropriate.
|
| 27 |
+
- Reference entity aliases from AVAILABLE ENTITIES when the ambiguity is about which entity.
|
| 28 |
+
- Don't ask multiple questions in one turn — pick the single most important blocker.
|
| 29 |
+
- Don't restate the user's full request; ask the one thing you need.
|
| 30 |
+
|
| 31 |
+
Output JSON only — no narration, no markdown fences, no chain-of-thought."""
|
| 32 |
+
|
| 33 |
+
# Generation params match training-time + integration defaults. repeat_penalty
|
| 34 |
+
# is 1.0 (not 1.15) because v0.4.7 was trained without it; 1.15 measurably
|
| 35 |
+
# distorts the slim-schema outputs (drops short keys like "r").
|
| 36 |
+
PARAMETER temperature 0.0
|
| 37 |
+
PARAMETER repeat_penalty 1.0
|
| 38 |
+
PARAMETER repeat_last_n 256
|
| 39 |
+
PARAMETER stop "<|im_end|>"
|
| 40 |
+
PARAMETER stop "<|endoftext|>"
|
Modelfile.command
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Ollama Modelfile for Selora-AI v0.4.7 / command specialist (Qwen3 1.7B + LoRA)
|
| 2 |
+
# Build: ollama create selora-qwen-command -f Modelfile.command
|
| 3 |
+
# Run: ollama run selora-qwen-command
|
| 4 |
+
|
| 5 |
+
FROM ./qwen3_17b_base.f16.gguf
|
| 6 |
+
ADAPTER ./selora-v047-command.f16.gguf
|
| 7 |
+
|
| 8 |
+
# Qwen3 ChatML template (/no_think suppresses the thinking block — matches the
|
| 9 |
+
# integration's enable_thinking=False chat-template-kwarg and how v0.4.7 was
|
| 10 |
+
# trained).
|
| 11 |
+
TEMPLATE """{{ if .System }}<|im_start|>system
|
| 12 |
+
{{ .System }}<|im_end|>
|
| 13 |
+
{{ end }}{{ if .Prompt }}<|im_start|>user
|
| 14 |
+
/no_think {{ .Prompt }}<|im_end|>
|
| 15 |
+
{{ end }}<|im_start|>assistant
|
| 16 |
+
"""
|
| 17 |
+
|
| 18 |
+
# v0.4.7 per-specialist system prompt (verbatim from prompts/command_system_prompt.txt).
|
| 19 |
+
SYSTEM """You are Selora AI's command specialist for Home Assistant.
|
| 20 |
+
|
| 21 |
+
Given a user command and the AVAILABLE ENTITIES list, respond with ONE JSON object only:
|
| 22 |
+
{"c":[{"s":"<service>","e":"<entity_id>","d":{<optional params>}}],"r":"<short confirmation>"}
|
| 23 |
+
|
| 24 |
+
Rules:
|
| 25 |
+
- c: ordered array of one or more service calls. Calls execute in array order.
|
| 26 |
+
- s: HA service in "domain.action" form (e.g. "light.turn_on", "lock.lock", "media_player.play_media", "scene.turn_on").
|
| 27 |
+
- e: canonical entity_id from AVAILABLE ENTITIES. Never use the human alias — always the entity_id.
|
| 28 |
+
- d: service parameters object. Omit the d key entirely when there are no params (do not include "d":{}).
|
| 29 |
+
- r: ≤ 1 sentence past-tense confirmation describing what got done (e.g. "Kitchen light on.").
|
| 30 |
+
- The service domain (before the dot) must match the entity_id's domain. light.turn_on goes with light.* entities, lock.lock goes with lock.* entities, etc.
|
| 31 |
+
- For multi-target requests, produce one c entry per (service, entity_id) pair.
|
| 32 |
+
|
| 33 |
+
Output JSON only — no narration, no markdown fences, no chain-of-thought."""
|
| 34 |
+
|
| 35 |
+
# Generation params match training-time + integration defaults. repeat_penalty
|
| 36 |
+
# is 1.0 (not 1.15) because v0.4.7 was trained without it; 1.15 measurably
|
| 37 |
+
# distorts the slim-schema outputs (drops short keys like "r").
|
| 38 |
+
PARAMETER temperature 0.0
|
| 39 |
+
PARAMETER repeat_penalty 1.0
|
| 40 |
+
PARAMETER repeat_last_n 256
|
| 41 |
+
PARAMETER stop "<|im_end|>"
|
| 42 |
+
PARAMETER stop "<|endoftext|>"
|
README.md
CHANGED
|
@@ -1,173 +1,146 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
base_model: Qwen/Qwen3-1.7B
|
| 4 |
-
tags:
|
| 5 |
-
- text-generation
|
| 6 |
-
- qwen
|
| 7 |
-
- qwen3
|
| 8 |
-
- lora
|
| 9 |
-
- home-assistant
|
| 10 |
-
- home-automation
|
| 11 |
-
- smart-home
|
| 12 |
-
- iot
|
| 13 |
-
- instruction-tuned
|
| 14 |
-
- tool-use
|
| 15 |
-
language:
|
| 16 |
-
- en
|
| 17 |
-
library_name: transformers
|
| 18 |
-
pipeline_tag: text-generation
|
| 19 |
---
|
| 20 |
|
| 21 |
-
#
|
| 22 |
-
|
| 23 |
-
Selora AI is an instruction-tuned [Qwen3 1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) model purpose-built for
|
| 24 |
-
[**Home Assistant**](https://www.home-assistant.io/), the open-source **smart home**
|
| 25 |
-
platform. Four specialist LoRA adapters cover device control, **home automation**
|
| 26 |
-
authoring, Q&A, and clarification — each with its own trained system prompt and
|
| 27 |
-
output shape. The `answer` adapter also emits a `query_state` tool envelope for
|
| 28 |
-
live device-state queries against the Home Assistant REST API.
|
| 29 |
-
|
| 30 |
-
Selora AI powers the [Selora AI Home Assistant integration](https://gitlab.com/selorahomes/products/selora-ai/ha-integration) and runs locally on Apple Silicon, Linux, or Windows via
|
| 31 |
-
[Ollama](#ollama) or [llama.cpp](#llamacpp), or in the cloud via [vLLM](#vllm-cloud). It targets self-hosted **IoT** deployments where users want their
|
| 32 |
-
home automation assistant to stay private and offline-first.
|
| 33 |
-
|
| 34 |
-
## Use cases
|
| 35 |
-
|
| 36 |
-
- **Voice and chat control of smart-home devices** — "turn off the kitchen
|
| 37 |
-
lights", "set the thermostat to 68", "open the garage door" — resolved against
|
| 38 |
-
live Home Assistant entity state.
|
| 39 |
-
- **Natural-language home automation creation** — describe an automation in
|
| 40 |
-
plain English ("when the front door opens after 10pm, turn on the porch
|
| 41 |
-
light") and Selora returns valid Home Assistant YAML with a risk assessment
|
| 42 |
-
for review before deployment.
|
| 43 |
-
- **Scene and routine orchestration** — chain actions across multiple entities
|
| 44 |
-
("good night" → lock doors, dim bedroom lights, set thermostat) without
|
| 45 |
-
hand-writing scripts.
|
| 46 |
-
- **Q&A about your home** — "is the laundry running?", "what's the temperature
|
| 47 |
-
upstairs?" — answered via a `query_state` tool call against the HA REST API.
|
| 48 |
-
- **Privacy-first home assistant** — runs entirely on local hardware
|
| 49 |
-
(Raspberry Pi 5, Mac mini, NUC-class boxes) with no cloud dependency, so
|
| 50 |
-
device commands and home telemetry never leave the LAN.
|
| 51 |
-
|
| 52 |
-
## Specialists
|
| 53 |
-
|
| 54 |
-
| Adapter | Intent | Output shape |
|
| 55 |
-
| --- | --- | --- |
|
| 56 |
-
| `command` | "Turn off the kitchen lights" | `{intent:"command",response,calls:[…]}` |
|
| 57 |
-
| `automation` | "Wake up lights at 6:30 AM" | `{intent:"automation",automation:{triggers,actions,…}}` |
|
| 58 |
-
| `answer` | Q&A / small talk | `{intent:"answer",response}` |
|
| 59 |
-
| `clarification` | Ask the user a follow-up | `{intent:"clarification",response}` |
|
| 60 |
-
|
| 61 |
-
The HA integration's `selora_local` provider classifies each request to
|
| 62 |
-
one of the four specialists before the call (cheap regex
|
| 63 |
-
pre-classifier), then sends the request with `model:
|
| 64 |
-
selora-v1-{specialist}`. Backends that support multi-LoRA
|
| 65 |
-
(llama-server's `/lora-adapters`, vLLM `--enable-lora`) activate the
|
| 66 |
-
matching adapter.
|
| 67 |
-
|
| 68 |
-
## Quick start
|
| 69 |
-
|
| 70 |
-
### Ollama
|
| 71 |
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
```
|
| 76 |
|
| 77 |
-
|
| 78 |
-
are also published as separate Ollama models.
|
| 79 |
|
| 80 |
-
###
|
| 81 |
|
| 82 |
-
```
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
```
|
| 92 |
|
| 93 |
-
|
| 94 |
-
`/v1/chat/completions` call.
|
| 95 |
|
| 96 |
-
###
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
|
| 98 |
```bash
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
selora-v1-automations=/path/to/peft/automation \
|
| 105 |
-
selora-v1-answers=/path/to/peft/answer \
|
| 106 |
-
selora-v1-clarifications=/path/to/peft/clarification
|
| 107 |
```
|
| 108 |
|
| 109 |
-
|
| 110 |
-
no extra routing layer needed.
|
| 111 |
|
| 112 |
-
|
| 113 |
|
| 114 |
-
```
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
"max_tokens": 384,
|
| 120 |
-
"stop": ["<|im_end|>", "<|endoftext|>"]
|
| 121 |
-
}
|
| 122 |
```
|
| 123 |
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
##
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
automation pairs). System prompts trained per-specialist; see
|
| 133 |
-
[`prompts/`](prompts/). The `answer` adapter went through a sequential
|
| 134 |
-
continuation pass that added a `query_state` tool envelope on top of
|
| 135 |
-
the original answer-only training distribution; that's preserved in
|
| 136 |
-
the augmented `prompts/answers.txt` and the `Modelfile.answers` SYSTEM
|
| 137 |
-
block.
|
| 138 |
-
|
| 139 |
-
## Evaluation
|
| 140 |
-
|
| 141 |
-
10/10 parity pass rate on the four-intent suite (command, automation,
|
| 142 |
-
answer, clarification — plus screenshot regressions). Validator and
|
| 143 |
-
scenarios live in [`parity/`](parity/).
|
| 144 |
-
|
| 145 |
-
## Files in this bundle
|
| 146 |
-
|
| 147 |
-
| Artifact | Purpose | Distribution |
|
| 148 |
-
| --- | --- | --- |
|
| 149 |
-
| `qwen3_17b_base.IQ4_XS.gguf` | Quantized base for Ollama / llama.cpp | Hugging Face, ollama.com |
|
| 150 |
-
| `qwen3_17b_{intent}.lora.gguf` (×4) | Specialist LoRA adapters | Hugging Face, ollama.com |
|
| 151 |
-
| `Modelfile.{intent}` (×4) | Ollama recipes (base + LoRA + system prompt) | this repo, ollama.com |
|
| 152 |
-
| `prompts/{intent}.txt` (×4) | Plain-text trained prompts (reference / testing) | this repo |
|
| 153 |
-
|
| 154 |
-
The full-precision (f16) base and HF safetensors set used by vLLM /
|
| 155 |
-
TGI / SageMaker live separately in the cloud bundle and are not yet
|
| 156 |
-
mirrored to Hugging Face.
|
| 157 |
-
|
| 158 |
-
## Citation
|
| 159 |
-
|
| 160 |
-
```bibtex
|
| 161 |
-
@misc{selora-ai-2026,
|
| 162 |
-
title = {Selora AI: Qwen3 1.7B + LoRA Specialists for Home Assistant},
|
| 163 |
-
author = {{Selora Homes}},
|
| 164 |
-
year = {2026},
|
| 165 |
-
url = {https://huggingface.co/selora-homes/selora-ai}
|
| 166 |
-
}
|
| 167 |
```
|
| 168 |
|
| 169 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 170 |
|
| 171 |
-
##
|
| 172 |
|
| 173 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
**Artifacts:** [Selora-AI-Local Package Registry — package 60760758](https://gitlab.com/selorahomes/products/selora-ai/models/-/packages/60760758)
|
| 2 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
+
# SeloraAI v0.4.7 — Local
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
+
**Base:** [Qwen/Qwen3-1.7B-Instruct](https://huggingface.co/Qwen/Qwen3-1.7B)
|
| 8 |
+
**Format:** GGUF **F16** base (~3.8 GB, no quantization) + 4 per-specialist LoRA adapters (F16)
|
| 9 |
+
**Architecture:** 4 LoRA specialists routed by a runtime classifier
|
| 10 |
+
**License:** Apache-2.0 (inherited from Qwen3)
|
| 11 |
+
|
| 12 |
+
## Specialists (4 LoRAs)
|
| 13 |
+
|
| 14 |
+
| Slot | Specialist | Iters | v0.4.3 final val | v0.4.7 final val | Δ |
|
| 15 |
+
|---:|---|---:|---:|---:|---:|
|
| 16 |
+
| 0 | command | 800 | 0.016 | **0.012** | -25% |
|
| 17 |
+
| 1 | automation | 900 | 0.005 | **0.004** | -20% |
|
| 18 |
+
| 2 | answer | 600 | 0.014 | **0.011** | -21% |
|
| 19 |
+
| 3 | clarification | 400 | 0.015 | **0.011** | -27% |
|
| 20 |
+
|
| 21 |
+
Every specialist beats its v0.4.3 final val by 20-27%, and the lower starting val on v0.4.7 (e.g. answer 3.58 → 3.01 at iter 1) reflects cleaner training data — the entity-block reconciliation and the negation-answer reshape mean the base model sees less "surprise" structure on the first eval pass.
|
| 22 |
+
|
| 23 |
+
## What changed since v0.4.3
|
| 24 |
+
|
| 25 |
+
### Recipe specialist dropped from the bundle
|
| 26 |
+
|
| 27 |
+
Recipe handling moves to a deterministic pipeline outside the model. The bundle is smaller (4 LoRAs instead of 5, ~120 MB → ~82 MB of LoRAs) and inference doesn't pay the recipe specialist's load cost. Consumer-side intent classifiers should map "install / set up / recipe" requests to the pipeline path, not to a model specialist.
|
| 28 |
+
|
| 29 |
+
### Entity-block format reconciled with the integration
|
| 30 |
+
|
| 31 |
+
`format_entities_block` in `scripts/gen_utils.py` now emits the exact per-line shape produced by `_format_entity_line` in `custom_components/selora_ai/llm_client/sanitize.py`:
|
| 32 |
+
|
| 33 |
+
```
|
| 34 |
+
AVAILABLE ENTITIES:
|
| 35 |
+
- entity_id=light.kitchen; state=off; friendly_name=Kitchen Lights
|
| 36 |
+
- entity_id=sensor.sun; state=below_horizon; friendly_name=Sun
|
| 37 |
```
|
| 38 |
|
| 39 |
+
Whitelisted attribute keys (`device_class`, `unit_of_measurement`, `percentage`, `current_temperature`, `target_temperature`, `brightness`) are appended in the same order on both sides. This eliminates the train-vs-inference drift that previously sent the model out-of-distribution on entity-context blocks.
|
|
|
|
| 40 |
|
| 41 |
+
### _negation_answer multi-turn reshape
|
| 42 |
|
| 43 |
+
`gen_multiturn._negation_answer` was a two-turn chain whose first assistant turn was a command envelope (`{"c":[...],"r":"…want me to also …?"}`) and whose second was the answer (`{"r":"OK, no problem."}`). Routing the whole chain to the answer specialist's training set meant mlx-lm trained on every assistant turn in the example — so the answer LoRA's gradient was reinforced on the command envelope and learned to occasionally emit `c` arrays.
|
| 44 |
+
|
| 45 |
+
The reshape folds the prior command turn into the user message as plain-English context, and only the final answer envelope reaches the LoRA's gradient:
|
| 46 |
+
|
| 47 |
+
Before (one example, 5 messages):
|
| 48 |
+
```
|
| 49 |
+
[system: answer prompt]
|
| 50 |
+
[user: "turn off the yard light"]
|
| 51 |
+
[assistant: {"c":[{"s":"light.turn_off","e":"light.yard"}],
|
| 52 |
+
"r":"Yard light off. Want me to turn off the kid bath light too?"}]
|
| 53 |
+
[user: "no thanks"]
|
| 54 |
+
[assistant: {"r":"OK, no problem."}]
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
After (one example, 3 messages — same conversational context):
|
| 58 |
+
```
|
| 59 |
+
[system: answer prompt]
|
| 60 |
+
[user: "earlier you turned off the yard light and asked if I
|
| 61 |
+
wanted the kid bath light off too — no thanks"]
|
| 62 |
+
[assistant: {"r":"OK, no problem."}]
|
| 63 |
```
|
| 64 |
|
| 65 |
+
Multi-turn awareness at inference is unchanged — the integration still feeds prior conversation history via `_SELORA_LOCAL_HISTORY_TURNS=3`. The cleaning was on the training-data side only.
|
|
|
|
| 66 |
|
| 67 |
+
### Pre-training audit script
|
| 68 |
+
|
| 69 |
+
`tools/audit.py` runs 22-29 checks before training (tools/generators/prompts/configs import cleanly, cross-layer specialist lists agree as sets, prompts are ASCII-safe, token-length p99 within the 4096 budget). Catches drift early instead of finding it after a training run.
|
| 70 |
+
|
| 71 |
+
## Training notes
|
| 72 |
+
|
| 73 |
+
- Framework: **mlx-lm** on M5 Max (64 GB unified memory)
|
| 74 |
+
- Optimizer: adam, learning rate 1e-4, batch size 4, max-seq-length 4096, grad-checkpoint enabled
|
| 75 |
+
- Data: synthetic corpus generated by `v0.4.7/scripts/gen_*.py` from 10 curated home specs + procedural variants. ~45,694 examples total (post-filter, dropped 1,036 over the 14k-char threshold): command 16,081 / automation 14,928 / answer 11,385 / clarification 3,300.
|
| 76 |
+
- Slot order is the **API contract**: 0=command, 1=automation, 2=answer, 3=clarification. Recipe is no longer a slot.
|
| 77 |
+
- LoRA ranks set via per-specialist YAML configs at `v0.4.7/training/configs/<spec>.yaml`.
|
| 78 |
+
|
| 79 |
+
## Files in this package
|
| 80 |
+
|
| 81 |
+
- `qwen3_17b_base.f16.gguf` (~3.8 GB) — F16 base, no quantization (keeps LoRA evaluation parity with training)
|
| 82 |
+
- 4 × `selora-v047-<specialist>.f16.gguf` (9.5 MB – 36 MB each, ~82 MB total)
|
| 83 |
+
- `manifest.json` — SHA-256 sums, sizes, slot order, training metadata
|
| 84 |
+
- 4 × `<specialist>_system_prompt.txt` — verbatim system prompts that produced the LoRAs
|
| 85 |
+
- 4 × `Modelfile.<specialist>` — optional [Ollama](https://ollama.com) Modelfiles (see "Running with Ollama" below)
|
| 86 |
+
|
| 87 |
+
## Running with Ollama (optional)
|
| 88 |
+
|
| 89 |
+
If you have [Ollama](https://ollama.com) installed, you can use the same base + LoRA artifacts through Ollama instead of llama.cpp. The Modelfiles in this package bake in the ChatML template (with `/no_think`), the per-specialist system prompt, and the generation parameters used at training time — so the only thing you need to do is build each specialist once and run it.
|
| 90 |
+
|
| 91 |
+
llama.cpp remains the reference runtime; Ollama is offered as a convenience for users who already have it on their system.
|
| 92 |
+
|
| 93 |
+
### Installing Ollama
|
| 94 |
+
|
| 95 |
+
These specialists require **Ollama 0.30 or later, installed via the official installer**. Two reasons to be specific:
|
| 96 |
+
|
| 97 |
+
- **Pre-0.30 Ollama lacks LoRA inference support** — `ollama run` fails with `"failed to initialize model: loras are not yet implemented"`. The build (`ollama create`) succeeds; only inference is blocked. Upgrade if you're on an older version.
|
| 98 |
+
- **`brew install ollama` is currently broken for all GGUF models on 0.30+.** The homebrew formula's `go build` does not produce the new `llama-server` binary that Ollama 0.30 split out for GGUF execution, so any GGUF model — including these — fails with `"error starting llama-server: llama-server binary not found"`. Tracking: [homebrew-core#285917](https://github.com/Homebrew/homebrew-core/issues/285917); fix queued in [PR #285963](https://github.com/Homebrew/homebrew-core/pull/285963). Until that lands, use the official installer.
|
| 99 |
+
|
| 100 |
+
Install via one of:
|
| 101 |
|
| 102 |
```bash
|
| 103 |
+
# macOS / Linux: one-line installer
|
| 104 |
+
curl -fsSL https://ollama.com/install.sh | sh
|
| 105 |
+
|
| 106 |
+
# macOS app bundle
|
| 107 |
+
# https://ollama.com/download
|
|
|
|
|
|
|
|
|
|
| 108 |
```
|
| 109 |
|
| 110 |
+
### Build
|
|
|
|
| 111 |
|
| 112 |
+
Download the base, all four LoRA `.f16.gguf` files, and the four `Modelfile.*` files into a single directory, then from that directory:
|
| 113 |
|
| 114 |
+
```bash
|
| 115 |
+
ollama create selora-qwen-command -f Modelfile.command
|
| 116 |
+
ollama create selora-qwen-automation -f Modelfile.automation
|
| 117 |
+
ollama create selora-qwen-answer -f Modelfile.answer
|
| 118 |
+
ollama create selora-qwen-clarification -f Modelfile.clarification
|
|
|
|
|
|
|
|
|
|
| 119 |
```
|
| 120 |
|
| 121 |
+
The base GGUF (`qwen3_17b_base.f16.gguf`) is stored once in Ollama's blob store and shared across all four specialists; only the ~10–37 MB LoRA adapter is added per slot.
|
| 122 |
+
|
| 123 |
+
### Run
|
| 124 |
+
|
| 125 |
+
```bash
|
| 126 |
+
ollama run selora-qwen-command
|
| 127 |
+
# >>> turn on the kitchen light
|
| 128 |
+
# {"c":[{"s":"light.turn_on","e":"light.kitchen"}],"r":"Kitchen light on."}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
```
|
| 130 |
|
| 131 |
+
Or hit the Ollama HTTP API directly — the template and system prompt are already attached:
|
| 132 |
+
|
| 133 |
+
```bash
|
| 134 |
+
curl http://localhost:11434/api/generate -d '{
|
| 135 |
+
"model": "selora-qwen-command",
|
| 136 |
+
"prompt": "AVAILABLE ENTITIES:\n - entity_id=light.kitchen; state=off; friendly_name=Kitchen Lights\n\nturn on the kitchen light",
|
| 137 |
+
"stream": false
|
| 138 |
+
}'
|
| 139 |
+
```
|
| 140 |
|
| 141 |
+
### Notes
|
| 142 |
|
| 143 |
+
- Pick the specialist that matches the intent: `command` for direct device control, `automation` for "create an automation that…", `answer` for state questions, `clarification` when input is ambiguous. In a full HA integration the runtime classifier picks for you — when calling Ollama directly, you choose by picking which model to run.
|
| 144 |
+
- Keep `repeat_penalty` at `1.0`. Raising it (Ollama's default elsewhere is 1.1) measurably distorts the slim-schema output and can drop short keys like `r`.
|
| 145 |
+
- The user message must include an `AVAILABLE ENTITIES:` block in the exact format above — these LoRAs were trained on that shape and go out-of-distribution without it.
|
| 146 |
+
- Minimum Ollama version: **0.30** (LoRA inference). See "Installing Ollama" above for installer guidance.
|
manifest.json
CHANGED
|
@@ -1,54 +1,50 @@
|
|
| 1 |
{
|
| 2 |
"name": "selora-ai-local",
|
| 3 |
"version": "0.4.7",
|
| 4 |
-
"description": "Selora AI v0.4.7 \u2014 Qwen3-1.7B
|
| 5 |
"base_model": {
|
| 6 |
"id": "Qwen/Qwen3-1.7B",
|
| 7 |
"format": "gguf",
|
| 8 |
-
"dtype": "
|
| 9 |
-
"filename": "qwen3_17b_base.
|
| 10 |
-
"size_bytes":
|
| 11 |
-
"sha256": "
|
| 12 |
},
|
| 13 |
"loras": [
|
| 14 |
{
|
| 15 |
"slot": 0,
|
| 16 |
"name": "command",
|
| 17 |
-
"rank": 16,
|
| 18 |
"filename": "selora-v047-command.f16.gguf",
|
| 19 |
"size_bytes": 19938528,
|
| 20 |
-
"sha256": "
|
| 21 |
},
|
| 22 |
{
|
| 23 |
"slot": 1,
|
| 24 |
"name": "automation",
|
| 25 |
-
"rank": 32,
|
| 26 |
"filename": "selora-v047-automation.f16.gguf",
|
| 27 |
-
"size_bytes":
|
| 28 |
-
"sha256": "
|
| 29 |
},
|
| 30 |
{
|
| 31 |
"slot": 2,
|
| 32 |
"name": "answer",
|
| 33 |
-
"rank": 12,
|
| 34 |
"filename": "selora-v047-answer.f16.gguf",
|
| 35 |
"size_bytes": 14957792,
|
| 36 |
-
"sha256": "
|
| 37 |
},
|
| 38 |
{
|
| 39 |
"slot": 3,
|
| 40 |
"name": "clarification",
|
| 41 |
-
"rank": 8,
|
| 42 |
"filename": "selora-v047-clarification.f16.gguf",
|
| 43 |
"size_bytes": 9977056,
|
| 44 |
-
"sha256": "
|
| 45 |
}
|
| 46 |
],
|
| 47 |
"system_prompts": {
|
| 48 |
"command": {
|
| 49 |
"filename": "command_system_prompt.txt",
|
| 50 |
-
"size_bytes":
|
| 51 |
-
"sha256": "
|
| 52 |
},
|
| 53 |
"automation": {
|
| 54 |
"filename": "automation_system_prompt.txt",
|
|
@@ -68,29 +64,28 @@
|
|
| 68 |
},
|
| 69 |
"runtime": {
|
| 70 |
"cache_prompt": true,
|
| 71 |
-
"ctx_size":
|
| 72 |
},
|
| 73 |
"training": {
|
| 74 |
"framework": "mlx-lm",
|
| 75 |
"base_model_repo": "Qwen/Qwen3-1.7B",
|
| 76 |
-
"
|
| 77 |
-
"
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
},
|
| 83 |
"iterations_per_specialist": {
|
| 84 |
-
"command":
|
| 85 |
-
"automation": 1800,
|
| 86 |
"answer": 600,
|
| 87 |
-
"clarification": 450
|
|
|
|
| 88 |
},
|
| 89 |
"examples_per_specialist": {
|
| 90 |
-
"command":
|
| 91 |
-
"
|
| 92 |
-
"
|
| 93 |
-
"
|
| 94 |
}
|
| 95 |
}
|
| 96 |
}
|
|
|
|
| 1 |
{
|
| 2 |
"name": "selora-ai-local",
|
| 3 |
"version": "0.4.7",
|
| 4 |
+
"description": "Selora AI v0.4.7 \u2014 Qwen3-1.7B F16 base + 4 LoRA specialists with slim action-then-confirm output schemas. Multi-state entity context (per-entity attribute tails in AVAILABLE ENTITIES) for richer single-turn grounding. Inference: cache_prompt enabled to amortize system-prompt KV cache across requests.",
|
| 5 |
"base_model": {
|
| 6 |
"id": "Qwen/Qwen3-1.7B",
|
| 7 |
"format": "gguf",
|
| 8 |
+
"dtype": "F16",
|
| 9 |
+
"filename": "qwen3_17b_base.f16.gguf",
|
| 10 |
+
"size_bytes": 4069678880,
|
| 11 |
+
"sha256": "3e4009f0d96955a45f29aa77bded839d376d7832823c6909f76c84ace81dc445"
|
| 12 |
},
|
| 13 |
"loras": [
|
| 14 |
{
|
| 15 |
"slot": 0,
|
| 16 |
"name": "command",
|
|
|
|
| 17 |
"filename": "selora-v047-command.f16.gguf",
|
| 18 |
"size_bytes": 19938528,
|
| 19 |
+
"sha256": "b10b5131157698400ee9fafac08ab4101b81230a79ad067eb8f89bd4e29a6273"
|
| 20 |
},
|
| 21 |
{
|
| 22 |
"slot": 1,
|
| 23 |
"name": "automation",
|
|
|
|
| 24 |
"filename": "selora-v047-automation.f16.gguf",
|
| 25 |
+
"size_bytes": 37374880,
|
| 26 |
+
"sha256": "1bdd8c54cb36332889498a67046f01b55de7e5ac019e350419ba98a9c104f78f"
|
| 27 |
},
|
| 28 |
{
|
| 29 |
"slot": 2,
|
| 30 |
"name": "answer",
|
|
|
|
| 31 |
"filename": "selora-v047-answer.f16.gguf",
|
| 32 |
"size_bytes": 14957792,
|
| 33 |
+
"sha256": "9ec14105e7119675a75c1e166c132298886f4a25e57628c4ef720fce2080171e"
|
| 34 |
},
|
| 35 |
{
|
| 36 |
"slot": 3,
|
| 37 |
"name": "clarification",
|
|
|
|
| 38 |
"filename": "selora-v047-clarification.f16.gguf",
|
| 39 |
"size_bytes": 9977056,
|
| 40 |
+
"sha256": "071558279b1bc8c8609fc63a1524604aa7fcf721c5bd6d3ecdf1f55ad16f5e1a"
|
| 41 |
}
|
| 42 |
],
|
| 43 |
"system_prompts": {
|
| 44 |
"command": {
|
| 45 |
"filename": "command_system_prompt.txt",
|
| 46 |
+
"size_bytes": 1071,
|
| 47 |
+
"sha256": "9921c6fef09c6ebad4a2ed4fad1dbe7e76efe0bfe4e532bf7c7fe096864de6a4"
|
| 48 |
},
|
| 49 |
"automation": {
|
| 50 |
"filename": "automation_system_prompt.txt",
|
|
|
|
| 64 |
},
|
| 65 |
"runtime": {
|
| 66 |
"cache_prompt": true,
|
| 67 |
+
"ctx_size": 4096
|
| 68 |
},
|
| 69 |
"training": {
|
| 70 |
"framework": "mlx-lm",
|
| 71 |
"base_model_repo": "Qwen/Qwen3-1.7B",
|
| 72 |
+
"optimizer": "adam",
|
| 73 |
+
"learning_rate": 0.0001,
|
| 74 |
+
"batch_size": 4,
|
| 75 |
+
"max_seq_length": 4096,
|
| 76 |
+
"english_only": true,
|
| 77 |
+
"data_source": "synthetic \u2014 slim schemas in slim_schemas.md, generated by scripts/gen_{intent}.py from 10 curated home specs + procedural variants; service_matrix.py covers 49 (domain, service) pairs. tools.home_specs.diversify_states() injects multi-state attributes per training example.",
|
|
|
|
| 78 |
"iterations_per_specialist": {
|
| 79 |
+
"command": 750,
|
|
|
|
| 80 |
"answer": 600,
|
| 81 |
+
"clarification": 450,
|
| 82 |
+
"automation": 1050
|
| 83 |
},
|
| 84 |
"examples_per_specialist": {
|
| 85 |
+
"command": 8800,
|
| 86 |
+
"answer": 6600,
|
| 87 |
+
"clarification": 3300,
|
| 88 |
+
"automation": 6600
|
| 89 |
}
|
| 90 |
}
|
| 91 |
}
|
prompts/command_system_prompt.txt
CHANGED
|
@@ -5,11 +5,11 @@ Given a user command and the AVAILABLE ENTITIES list, respond with ONE JSON obje
|
|
| 5 |
|
| 6 |
Rules:
|
| 7 |
- c: ordered array of one or more service calls. Calls execute in array order.
|
| 8 |
-
- s: HA service in "domain.action" form
|
| 9 |
- e: canonical entity_id from AVAILABLE ENTITIES. Never use the human alias — always the entity_id.
|
| 10 |
- d: service parameters object. Omit the d key entirely when there are no params (do not include "d":{}).
|
| 11 |
- r: ≤ 1 sentence past-tense confirmation describing what got done (e.g. "Kitchen light on.").
|
| 12 |
-
- The service domain (before the dot) must match the entity_id's domain. light.turn_on goes with light.* entities,
|
| 13 |
- For multi-target requests, produce one c entry per (service, entity_id) pair.
|
| 14 |
|
| 15 |
Output JSON only — no narration, no markdown fences, no chain-of-thought.
|
|
|
|
| 5 |
|
| 6 |
Rules:
|
| 7 |
- c: ordered array of one or more service calls. Calls execute in array order.
|
| 8 |
+
- s: HA service in "domain.action" form (e.g. "light.turn_on", "lock.lock", "media_player.play_media", "scene.turn_on").
|
| 9 |
- e: canonical entity_id from AVAILABLE ENTITIES. Never use the human alias — always the entity_id.
|
| 10 |
- d: service parameters object. Omit the d key entirely when there are no params (do not include "d":{}).
|
| 11 |
- r: ≤ 1 sentence past-tense confirmation describing what got done (e.g. "Kitchen light on.").
|
| 12 |
+
- The service domain (before the dot) must match the entity_id's domain. light.turn_on goes with light.* entities, lock.lock goes with lock.* entities, etc.
|
| 13 |
- For multi-target requests, produce one c entry per (service, entity_id) pair.
|
| 14 |
|
| 15 |
Output JSON only — no narration, no markdown fences, no chain-of-thought.
|
qwen3_17b_base.f16.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3e4009f0d96955a45f29aa77bded839d376d7832823c6909f76c84ace81dc445
|
| 3 |
+
size 4069678880
|
selora-v047-answer.f16.gguf
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 14957792
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9ec14105e7119675a75c1e166c132298886f4a25e57628c4ef720fce2080171e
|
| 3 |
size 14957792
|
selora-v047-automation.f16.gguf
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1bdd8c54cb36332889498a67046f01b55de7e5ac019e350419ba98a9c104f78f
|
| 3 |
+
size 37374880
|
selora-v047-clarification.f16.gguf
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 9977056
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:071558279b1bc8c8609fc63a1524604aa7fcf721c5bd6d3ecdf1f55ad16f5e1a
|
| 3 |
size 9977056
|
selora-v047-command.f16.gguf
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 19938528
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b10b5131157698400ee9fafac08ab4101b81230a79ad067eb8f89bd4e29a6273
|
| 3 |
size 19938528
|