Text Generation
Transformers
GGUF
English
qwen
qwen3
lora
home-assistant
home-automation
smart-home
tool-use
conversational
Instructions to use selorahomes/Selora-AI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use selorahomes/Selora-AI with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="selorahomes/Selora-AI") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("selorahomes/Selora-AI", dtype="auto") - llama-cpp-python
How to use selorahomes/Selora-AI with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="selorahomes/Selora-AI", filename="qwen3_17b_base.Q6_K.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use selorahomes/Selora-AI with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf selorahomes/Selora-AI:Q6_K # Run inference directly in the terminal: llama-cli -hf selorahomes/Selora-AI:Q6_K
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf selorahomes/Selora-AI:Q6_K # Run inference directly in the terminal: llama-cli -hf selorahomes/Selora-AI:Q6_K
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf selorahomes/Selora-AI:Q6_K # Run inference directly in the terminal: ./llama-cli -hf selorahomes/Selora-AI:Q6_K
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf selorahomes/Selora-AI:Q6_K # Run inference directly in the terminal: ./build/bin/llama-cli -hf selorahomes/Selora-AI:Q6_K
Use Docker
docker model run hf.co/selorahomes/Selora-AI:Q6_K
- LM Studio
- Jan
- vLLM
How to use selorahomes/Selora-AI with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "selorahomes/Selora-AI" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "selorahomes/Selora-AI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/selorahomes/Selora-AI:Q6_K
- SGLang
How to use selorahomes/Selora-AI with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "selorahomes/Selora-AI" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "selorahomes/Selora-AI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "selorahomes/Selora-AI" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "selorahomes/Selora-AI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use selorahomes/Selora-AI with Ollama:
ollama run hf.co/selorahomes/Selora-AI:Q6_K
- Unsloth Studio
How to use selorahomes/Selora-AI with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for selorahomes/Selora-AI to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for selorahomes/Selora-AI to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for selorahomes/Selora-AI to start chatting
- Pi
How to use selorahomes/Selora-AI with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf selorahomes/Selora-AI:Q6_K
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "selorahomes/Selora-AI:Q6_K" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use selorahomes/Selora-AI with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf selorahomes/Selora-AI:Q6_K
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default selorahomes/Selora-AI:Q6_K
Run Hermes
hermes
- Docker Model Runner
How to use selorahomes/Selora-AI with Docker Model Runner:
docker model run hf.co/selorahomes/Selora-AI:Q6_K
- Lemonade
How to use selorahomes/Selora-AI with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull selorahomes/Selora-AI:Q6_K
Run and chat with the model
lemonade run user.Selora-AI-Q6_K
List all available models
lemonade list
| license: apache-2.0 | |
| base_model: Qwen/Qwen3-1.7B | |
| tags: | |
| - text-generation | |
| - qwen | |
| - qwen3 | |
| - lora | |
| - home-assistant | |
| - home-automation | |
| - smart-home | |
| - tool-use | |
| language: | |
| - en | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| # Selora AI | |
| Qwen3 1.7B fine-tuned for Home Assistant with four specialist LoRA | |
| adapters. The `answer` adapter additionally emits a `query_state` tool | |
| envelope for live device-state queries against the Home Assistant REST | |
| API. Used by the [Selora AI Home Assistant | |
| integration](https://gitlab.com/selorahomes/products/selora-ai/ha-integration); | |
| also runnable directly via Ollama, llama.cpp, or vLLM. | |
| ## Specialists | |
| | Adapter | Intent | Output shape | | |
| | --- | --- | --- | | |
| | `command` | "Turn off the kitchen lights" | `{intent:"command",response,calls:[…]}` | | |
| | `automation` | "Wake up lights at 6:30 AM" | `{intent:"automation",automation:{triggers,actions,…}}` | | |
| | `answer` | Q&A / small talk | `{intent:"answer",response}` | | |
| | `clarification` | Ask the user a follow-up | `{intent:"clarification",response}` | | |
| The HA integration's `selora_local` provider classifies each request to | |
| one of the four specialists before the call (cheap regex | |
| pre-classifier), then sends the request with `model: | |
| selora-v1-{specialist}`. Backends that support multi-LoRA | |
| (llama-server's `/lora-adapters`, vLLM `--enable-lora`) activate the | |
| matching adapter. | |
| ## Quick start | |
| ### Ollama | |
| ```bash | |
| ollama pull selora/commands | |
| ollama run selora/commands | |
| ``` | |
| Modelfiles for all four specialists live in [`ollama/`](ollama/) and | |
| are also published as separate Ollama models. | |
| ### llama.cpp | |
| ```bash | |
| llama-server \ | |
| --model qwen3_17b_base.Q4_K_M.gguf \ | |
| --lora-init-without-apply \ | |
| --lora qwen3_17b_command.lora.gguf \ | |
| --lora qwen3_17b_automation.lora.gguf \ | |
| --lora qwen3_17b_answer.lora.gguf \ | |
| --lora qwen3_17b_clarification.lora.gguf \ | |
| --ctx-size 8192 | |
| ``` | |
| POST to `/lora-adapters` to switch the active LoRA before each | |
| `/v1/chat/completions` call. | |
| ### vLLM (cloud) | |
| ```bash | |
| python -m vllm.entrypoints.openai.api_server \ | |
| --model ./qwen3_17b_hf \ | |
| --enable-lora --max-loras 4 --max-lora-rank 32 \ | |
| --lora-modules \ | |
| selora-v1-commands=/path/to/peft/command \ | |
| selora-v1-automations=/path/to/peft/automation \ | |
| selora-v1-answers=/path/to/peft/answer \ | |
| selora-v1-clarifications=/path/to/peft/clarification | |
| ``` | |
| vLLM activates the matching LoRA based on the request's `model` field; | |
| no extra routing layer needed. | |
| ## Generation parameters | |
| ```json | |
| { | |
| "temperature": 0.0, | |
| "repeat_penalty": 1.15, | |
| "repeat_last_n": 256, | |
| "max_tokens": 384, | |
| "stop": ["<|im_end|>", "<|endoftext|>"] | |
| } | |
| ``` | |
| Bump `max_tokens` to 1536 for automation requests (longer JSON output). | |
| ## Training | |
| Base: [Qwen3 1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) fine-tuned | |
| with [Apple mlx-lm](https://github.com/ml-explore/mlx-examples). Each | |
| specialist has its own LoRA (rank 8–28, scale 20) trained on a curated | |
| HA-domain corpus (forum threads, HA docs, synthetic command / | |
| automation pairs). System prompts trained per-specialist; see | |
| [`prompts/`](prompts/). The `answer` adapter went through a sequential | |
| continuation pass that added a `query_state` tool envelope on top of | |
| the original answer-only training distribution; that's preserved in | |
| the augmented `prompts/answers.txt` and the `Modelfile.answers` SYSTEM | |
| block. | |
| ## Evaluation | |
| 10/10 parity pass rate on the four-intent suite (command, automation, | |
| answer, clarification — plus screenshot regressions). Validator and | |
| scenarios live in [`parity/`](parity/). | |
| ## Files in this bundle | |
| | Artifact | Purpose | Distribution | | |
| | --- | --- | --- | | |
| | `qwen3_17b_base.IQ4_XS.gguf` | Quantized base for Ollama / llama.cpp | Hugging Face, ollama.com | | |
| | `qwen3_17b_{intent}.lora.gguf` (×4) | Specialist LoRA adapters | Hugging Face, ollama.com | | |
| | `Modelfile.{intent}` (×4) | Ollama recipes (base + LoRA + system prompt) | this repo, ollama.com | | |
| | `prompts/{intent}.txt` (×4) | Plain-text trained prompts (reference / testing) | this repo | | |
| The full-precision (f16) base and HF safetensors set used by vLLM / | |
| TGI / SageMaker live separately in the cloud bundle and are not yet | |
| mirrored to Hugging Face. | |
| ## Citation | |
| ```bibtex | |
| @misc{selora-ai-2026, | |
| title = {Selora AI: Qwen3 1.7B + LoRA Specialists for Home Assistant}, | |
| author = {{Selora Homes}}, | |
| year = {2026}, | |
| url = {https://huggingface.co/selora-homes/selora-ai} | |
| } | |
| ``` | |
| Base model citation: Qwen Team, *Qwen3 Technical Report* (2025). | |
| ## License | |
| Apache-2.0 (matches the Qwen3 base license). | |