| # Getting started |
|
|
| ## 1. Add the package and let models provision |
|
|
| Add `com.sky.ondeviceagent` to your project (Package Manager → *Add package from git URL*, or reference |
| a local clone under `Packages/`). The six `com.sky.sentis.*` model packages are declared as hard |
| dependencies and are pulled in with it. |
|
|
| On-device models are **not** vendored in this repo — they are fetched on demand: |
|
|
| - **Sentis models** (wake-word, VAD, Whisper STT, E5 text embeddings, Supertonic TTS, YOLOX vision) |
| download from Hugging Face (`Sky-Kim/com.sky.sentis.*`) into each model package's `Models~/` folder |
| on first Editor load (and again before a player build). An Editor step then copies them into |
| `StreamingAssets/Model/` so the player ships them. No manual download step. |
| - **On-device LLM** (Android) streams from Hugging Face on first launch; see |
| [android-llm.md](android-llm.md). |
|
|
| See [../THIRD_PARTY_NOTICES.md](../THIRD_PARTY_NOTICES.md) for each model's source and license. |
|
|
| ### YOLO detector model (YOLOX, Apache-2.0) |
|
|
| The vision detector uses **[YOLOX](https://github.com/Megvii-BaseDetection/YOLOX) (Apache-2.0)** — do |
| **not** use Ultralytics weights (`yolo26n`, AGPL-3.0). The weights ship in the `com.sky.sentis.yolox` |
| package (`yolox_fp16.sentis`, provisioned from Hugging Face like the other Sentis models); |
| `CocoYoloDetector` loads it at runtime. |
|
|
| The decoder expects a single output `[1, N, 5+C]` (or transposed) of `[cx, cy, w, h, obj, classes...]` |
| in input-pixel coordinates, and RGB NCHW input in `[0,1]`. NMS runs on the C# side. |
|
|
| ### Sample knowledge index (optional, for RAG) |
|
|
| The Voice Assistant sample ships a pre-built LightRAG index in its `StreamingAssets/VoiceAgent/DB/`, so |
| retrieval works out of the box. To rebuild it from the synthetic corpus in |
| `StreamingAssets/Knowledge/`, use the **KnowledgeIngest** Editor tool (menu added by |
| `Runtime/AgentCore/Editor/KnowledgeIngestMenu.cs`). Rebuilding requires a running Ollama for the |
| ingest model. Without an index the agent still runs; only knowledge retrieval is unavailable. |
|
|
| ## 2. Install the desktop LLM (Ollama) |
|
|
| The Editor/desktop path talks to an [Ollama](https://ollama.com) server. |
|
|
| ```bash |
| # install Ollama (see ollama.com), then: |
| ollama pull gemma4:e2b # match the model configured on the agent |
| ollama serve # default endpoint http://localhost:11434 |
| ``` |
|
|
| > **Note:** `gemma4:e2b` is the tag configured in code (`AgentBuilder.Model` / the agent's `m_Model`), |
| > not a public Ollama registry tag — the real Gemma tags are `gemma` / `gemma2` / `gemma3` / `gemma3n`. |
| > Running `ollama pull gemma4:e2b` as-is will fail with "model not found" unless you have a matching |
| > local model tagged that way. Either tag a local model as `gemma4:e2b`, or change `m_Model` on the |
| > agent component in the Inspector to a tag you have pulled. |
|
|
| The default endpoint and model are configurable on the agent component in the Inspector. |
|
|
| ## 3. Run the sample |
|
|
| 1. Use **Unity 6000.4.8f1** (Unity 6.x). Other versions are untested. |
| 2. Open the [ondeviceagent-sample](https://github.com/skykim/ondeviceagent-sample) project, which |
| references this package and wires the full pipeline into a scene. |
| 3. Open the sample scene, press **Play**, say the wake word, then ask a question (e.g. a URP/rendering |
| question to exercise the bundled knowledge base, or any general question to exercise web search). |
|
|
| See the [ondeviceagent-sample](https://github.com/skykim/ondeviceagent-sample) repository for sample |
| details. |
|
|
| ## 4. Android on-device LLM (optional) |
|
|
| To run the LLM fully on-device on Android instead of via Ollama, see [android-llm.md](android-llm.md). |
| In short: side-load (or download) a `.litertlm` model; the `llm-release.aar` bridge ships in this |
| package. |
|
|
| ## Troubleshooting |
|
|
| - **No response from the agent (desktop):** confirm `ollama serve` is running and the configured |
| model is pulled. Check the endpoint in the Inspector. |
| - **Models missing at runtime:** confirm the Sentis model provisioning step ran (check the Editor |
| console on load / before build) so `StreamingAssets/Model/` is populated; provisioning needs network |
| access to Hugging Face. |
| - **Wake word not triggering:** the voice pipeline needs microphone permission; on desktop, confirm |
| the OS granted Unity microphone access. |
|
|