# Getting started

## 1. Add the package and let models provision

Add `com.sky.ondeviceagent` to your project (Package Manager → *Add package from git URL*, or reference
a local clone under `Packages/`). The six `com.sky.sentis.*` model packages are declared as hard
dependencies and are pulled in with it.

On-device models are **not** vendored in this repo — they are fetched on demand:

- **Sentis models** (wake-word, VAD, Whisper STT, E5 text embeddings, Supertonic TTS, YOLOX vision)
  download from Hugging Face (`Sky-Kim/com.sky.sentis.*`) into each model package's `Models~/` folder
  on first Editor load (and again before a player build). An Editor step then copies them into
  `StreamingAssets/Model/` so the player ships them. No manual download step.
- **On-device LLM** (Android) streams from Hugging Face on first launch; see
  [android-llm.md](android-llm.md).

See [../THIRD_PARTY_NOTICES.md](../THIRD_PARTY_NOTICES.md) for each model's source and license.

### YOLO detector model (YOLOX, Apache-2.0)

The vision detector uses **[YOLOX](https://github.com/Megvii-BaseDetection/YOLOX) (Apache-2.0)** — do
**not** use Ultralytics weights (`yolo26n`, AGPL-3.0). The weights ship in the `com.sky.sentis.yolox`
package (`yolox_fp16.sentis`, provisioned from Hugging Face like the other Sentis models);
`CocoYoloDetector` loads it at runtime.

The decoder expects a single output `[1, N, 5+C]` (or transposed) of `[cx, cy, w, h, obj, classes...]`
in input-pixel coordinates, and RGB NCHW input in `[0,1]`. NMS runs on the C# side.

### Sample knowledge index (optional, for RAG)

The Voice Assistant sample ships a pre-built LightRAG index in its `StreamingAssets/VoiceAgent/DB/`, so
retrieval works out of the box. To rebuild it from the synthetic corpus in
`StreamingAssets/Knowledge/`, use the **KnowledgeIngest** Editor tool (menu added by
`Runtime/AgentCore/Editor/KnowledgeIngestMenu.cs`). Rebuilding requires a running Ollama for the
ingest model. Without an index the agent still runs; only knowledge retrieval is unavailable.

## 2. Install the desktop LLM (Ollama)

The Editor/desktop path talks to an [Ollama](https://ollama.com) server.

```bash
# install Ollama (see ollama.com), then:
ollama pull gemma4:e2b      # match the model configured on the agent
ollama serve                # default endpoint http://localhost:11434
```

> **Note:** `gemma4:e2b` is the tag configured in code (`AgentBuilder.Model` / the agent's `m_Model`),
> not a public Ollama registry tag — the real Gemma tags are `gemma` / `gemma2` / `gemma3` / `gemma3n`.
> Running `ollama pull gemma4:e2b` as-is will fail with "model not found" unless you have a matching
> local model tagged that way. Either tag a local model as `gemma4:e2b`, or change `m_Model` on the
> agent component in the Inspector to a tag you have pulled.

The default endpoint and model are configurable on the agent component in the Inspector.

## 3. Run the sample

1. Use **Unity 6000.4.8f1** (Unity 6.x). Other versions are untested.
2. Open the [ondeviceagent-sample](https://github.com/skykim/ondeviceagent-sample) project, which
   references this package and wires the full pipeline into a scene.
3. Open the sample scene, press **Play**, say the wake word, then ask a question (e.g. a URP/rendering
   question to exercise the bundled knowledge base, or any general question to exercise web search).

See the [ondeviceagent-sample](https://github.com/skykim/ondeviceagent-sample) repository for sample
details.

## 4. Android on-device LLM (optional)

To run the LLM fully on-device on Android instead of via Ollama, see [android-llm.md](android-llm.md).
In short: side-load (or download) a `.litertlm` model; the `llm-release.aar` bridge ships in this
package.

## Troubleshooting

- **No response from the agent (desktop):** confirm `ollama serve` is running and the configured
  model is pulled. Check the endpoint in the Inspector.
- **Models missing at runtime:** confirm the Sentis model provisioning step ran (check the Editor
  console on load / before build) so `StreamingAssets/Model/` is populated; provisioning needs network
  access to Hugging Face.
- **Wake word not triggering:** the voice pipeline needs microphone permission; on desktop, confirm
  the OS granted Unity microphone access.