com.sky.ondeviceagent / docs /getting-started.md
Sky-Kim's picture
Initial commit
2e7837a
|
Raw
History Blame Contribute Delete
4.29 kB
# Getting started
## 1. Add the package and let models provision
Add `com.sky.ondeviceagent` to your project (Package Manager → *Add package from git URL*, or reference
a local clone under `Packages/`). The six `com.sky.sentis.*` model packages are declared as hard
dependencies and are pulled in with it.
On-device models are **not** vendored in this repo — they are fetched on demand:
- **Sentis models** (wake-word, VAD, Whisper STT, E5 text embeddings, Supertonic TTS, YOLOX vision)
download from Hugging Face (`Sky-Kim/com.sky.sentis.*`) into each model package's `Models~/` folder
on first Editor load (and again before a player build). An Editor step then copies them into
`StreamingAssets/Model/` so the player ships them. No manual download step.
- **On-device LLM** (Android) streams from Hugging Face on first launch; see
[android-llm.md](android-llm.md).
See [../THIRD_PARTY_NOTICES.md](../THIRD_PARTY_NOTICES.md) for each model's source and license.
### YOLO detector model (YOLOX, Apache-2.0)
The vision detector uses **[YOLOX](https://github.com/Megvii-BaseDetection/YOLOX) (Apache-2.0)** — do
**not** use Ultralytics weights (`yolo26n`, AGPL-3.0). The weights ship in the `com.sky.sentis.yolox`
package (`yolox_fp16.sentis`, provisioned from Hugging Face like the other Sentis models);
`CocoYoloDetector` loads it at runtime.
The decoder expects a single output `[1, N, 5+C]` (or transposed) of `[cx, cy, w, h, obj, classes...]`
in input-pixel coordinates, and RGB NCHW input in `[0,1]`. NMS runs on the C# side.
### Sample knowledge index (optional, for RAG)
The Voice Assistant sample ships a pre-built LightRAG index in its `StreamingAssets/VoiceAgent/DB/`, so
retrieval works out of the box. To rebuild it from the synthetic corpus in
`StreamingAssets/Knowledge/`, use the **KnowledgeIngest** Editor tool (menu added by
`Runtime/AgentCore/Editor/KnowledgeIngestMenu.cs`). Rebuilding requires a running Ollama for the
ingest model. Without an index the agent still runs; only knowledge retrieval is unavailable.
## 2. Install the desktop LLM (Ollama)
The Editor/desktop path talks to an [Ollama](https://ollama.com) server.
```bash
# install Ollama (see ollama.com), then:
ollama pull gemma4:e2b # match the model configured on the agent
ollama serve # default endpoint http://localhost:11434
```
> **Note:** `gemma4:e2b` is the tag configured in code (`AgentBuilder.Model` / the agent's `m_Model`),
> not a public Ollama registry tag — the real Gemma tags are `gemma` / `gemma2` / `gemma3` / `gemma3n`.
> Running `ollama pull gemma4:e2b` as-is will fail with "model not found" unless you have a matching
> local model tagged that way. Either tag a local model as `gemma4:e2b`, or change `m_Model` on the
> agent component in the Inspector to a tag you have pulled.
The default endpoint and model are configurable on the agent component in the Inspector.
## 3. Run the sample
1. Use **Unity 6000.4.8f1** (Unity 6.x). Other versions are untested.
2. Open the [ondeviceagent-sample](https://github.com/skykim/ondeviceagent-sample) project, which
references this package and wires the full pipeline into a scene.
3. Open the sample scene, press **Play**, say the wake word, then ask a question (e.g. a URP/rendering
question to exercise the bundled knowledge base, or any general question to exercise web search).
See the [ondeviceagent-sample](https://github.com/skykim/ondeviceagent-sample) repository for sample
details.
## 4. Android on-device LLM (optional)
To run the LLM fully on-device on Android instead of via Ollama, see [android-llm.md](android-llm.md).
In short: side-load (or download) a `.litertlm` model; the `llm-release.aar` bridge ships in this
package.
## Troubleshooting
- **No response from the agent (desktop):** confirm `ollama serve` is running and the configured
model is pulled. Check the endpoint in the Inspector.
- **Models missing at runtime:** confirm the Sentis model provisioning step ran (check the Editor
console on load / before build) so `StreamingAssets/Model/` is populated; provisioning needs network
access to Hugging Face.
- **Wake word not triggering:** the voice pipeline needs microphone permission; on desktop, confirm
the OS granted Unity microphone access.