# Getting started ## 1. Add the package and let models provision Add `com.sky.ondeviceagent` to your project (Package Manager → *Add package from git URL*, or reference a local clone under `Packages/`). The six `com.sky.sentis.*` model packages are declared as hard dependencies and are pulled in with it. On-device models are **not** vendored in this repo — they are fetched on demand: - **Sentis models** (wake-word, VAD, Whisper STT, E5 text embeddings, Supertonic TTS, YOLOX vision) download from Hugging Face (`Sky-Kim/com.sky.sentis.*`) into each model package's `Models~/` folder on first Editor load (and again before a player build). An Editor step then copies them into `StreamingAssets/Model/` so the player ships them. No manual download step. - **On-device LLM** (Android) streams from Hugging Face on first launch; see [android-llm.md](android-llm.md). See [../THIRD_PARTY_NOTICES.md](../THIRD_PARTY_NOTICES.md) for each model's source and license. ### YOLO detector model (YOLOX, Apache-2.0) The vision detector uses **[YOLOX](https://github.com/Megvii-BaseDetection/YOLOX) (Apache-2.0)** — do **not** use Ultralytics weights (`yolo26n`, AGPL-3.0). The weights ship in the `com.sky.sentis.yolox` package (`yolox_fp16.sentis`, provisioned from Hugging Face like the other Sentis models); `CocoYoloDetector` loads it at runtime. The decoder expects a single output `[1, N, 5+C]` (or transposed) of `[cx, cy, w, h, obj, classes...]` in input-pixel coordinates, and RGB NCHW input in `[0,1]`. NMS runs on the C# side. ### Sample knowledge index (optional, for RAG) The Voice Assistant sample ships a pre-built LightRAG index in its `StreamingAssets/VoiceAgent/DB/`, so retrieval works out of the box. To rebuild it from the synthetic corpus in `StreamingAssets/Knowledge/`, use the **KnowledgeIngest** Editor tool (menu added by `Runtime/AgentCore/Editor/KnowledgeIngestMenu.cs`). Rebuilding requires a running Ollama for the ingest model. Without an index the agent still runs; only knowledge retrieval is unavailable. ## 2. Install the desktop LLM (Ollama) The Editor/desktop path talks to an [Ollama](https://ollama.com) server. ```bash # install Ollama (see ollama.com), then: ollama pull gemma4:e2b # match the model configured on the agent ollama serve # default endpoint http://localhost:11434 ``` > **Note:** `gemma4:e2b` is the tag configured in code (`AgentBuilder.Model` / the agent's `m_Model`), > not a public Ollama registry tag — the real Gemma tags are `gemma` / `gemma2` / `gemma3` / `gemma3n`. > Running `ollama pull gemma4:e2b` as-is will fail with "model not found" unless you have a matching > local model tagged that way. Either tag a local model as `gemma4:e2b`, or change `m_Model` on the > agent component in the Inspector to a tag you have pulled. The default endpoint and model are configurable on the agent component in the Inspector. ## 3. Run the sample 1. Use **Unity 6000.4.8f1** (Unity 6.x). Other versions are untested. 2. Open the [ondeviceagent-sample](https://github.com/skykim/ondeviceagent-sample) project, which references this package and wires the full pipeline into a scene. 3. Open the sample scene, press **Play**, say the wake word, then ask a question (e.g. a URP/rendering question to exercise the bundled knowledge base, or any general question to exercise web search). See the [ondeviceagent-sample](https://github.com/skykim/ondeviceagent-sample) repository for sample details. ## 4. Android on-device LLM (optional) To run the LLM fully on-device on Android instead of via Ollama, see [android-llm.md](android-llm.md). In short: side-load (or download) a `.litertlm` model; the `llm-release.aar` bridge ships in this package. ## Troubleshooting - **No response from the agent (desktop):** confirm `ollama serve` is running and the configured model is pulled. Check the endpoint in the Inspector. - **Models missing at runtime:** confirm the Sentis model provisioning step ran (check the Editor console on load / before build) so `StreamingAssets/Model/` is populated; provisioning needs network access to Hugging Face. - **Wake word not triggering:** the voice pipeline needs microphone permission; on desktop, confirm the OS granted Unity microphone access.