com.sky.ondeviceagent / docs /getting-started.md

Initial commit

2e7837a 11 days ago

4.29 kB

	# Getting started

	## 1. Add the package and let models provision

	Add `com.sky.ondeviceagent` to your project (Package Manager → Add package from git URL, or reference
	a local clone under `Packages/`). The six `com.sky.sentis.*` model packages are declared as hard
	dependencies and are pulled in with it.

	On-device models are not vendored in this repo — they are fetched on demand:

	- Sentis models (wake-word, VAD, Whisper STT, E5 text embeddings, Supertonic TTS, YOLOX vision)
	download from Hugging Face (`Sky-Kim/com.sky.sentis.*`) into each model package's `Models~/` folder
	on first Editor load (and again before a player build). An Editor step then copies them into
	`StreamingAssets/Model/` so the player ships them. No manual download step.
	- On-device LLM (Android) streams from Hugging Face on first launch; see
	[android-llm.md](android-llm.md).

	See [../THIRD_PARTY_NOTICES.md](../THIRD_PARTY_NOTICES.md) for each model's source and license.

	### YOLO detector model (YOLOX, Apache-2.0)

	The vision detector uses [YOLOX](https://github.com/Megvii-BaseDetection/YOLOX) (Apache-2.0) — do
	not use Ultralytics weights (`yolo26n`, AGPL-3.0). The weights ship in the `com.sky.sentis.yolox`
	package (`yolox_fp16.sentis`, provisioned from Hugging Face like the other Sentis models);
	`CocoYoloDetector` loads it at runtime.

	The decoder expects a single output `[1, N, 5+C]` (or transposed) of `[cx, cy, w, h, obj, classes...]`
	in input-pixel coordinates, and RGB NCHW input in `[0,1]`. NMS runs on the C# side.

	### Sample knowledge index (optional, for RAG)

	The Voice Assistant sample ships a pre-built LightRAG index in its `StreamingAssets/VoiceAgent/DB/`, so
	retrieval works out of the box. To rebuild it from the synthetic corpus in
	`StreamingAssets/Knowledge/`, use the KnowledgeIngest Editor tool (menu added by
	`Runtime/AgentCore/Editor/KnowledgeIngestMenu.cs`). Rebuilding requires a running Ollama for the
	ingest model. Without an index the agent still runs; only knowledge retrieval is unavailable.

	## 2. Install the desktop LLM (Ollama)

	The Editor/desktop path talks to an [Ollama](https://ollama.com) server.

	```bash
	# install Ollama (see ollama.com), then:
	ollama pull gemma4:e2b # match the model configured on the agent
	ollama serve # default endpoint http://localhost:11434
	```

	> Note: `gemma4:e2b` is the tag configured in code (`AgentBuilder.Model` / the agent's `m_Model`),
	> not a public Ollama registry tag — the real Gemma tags are `gemma` / `gemma2` / `gemma3` / `gemma3n`.
	> Running `ollama pull gemma4:e2b` as-is will fail with "model not found" unless you have a matching
	> local model tagged that way. Either tag a local model as `gemma4:e2b`, or change `m_Model` on the
	> agent component in the Inspector to a tag you have pulled.

	The default endpoint and model are configurable on the agent component in the Inspector.

	## 3. Run the sample

	1. Use Unity 6000.4.8f1 (Unity 6.x). Other versions are untested.
	2. Open the [ondeviceagent-sample](https://github.com/skykim/ondeviceagent-sample) project, which
	references this package and wires the full pipeline into a scene.
	3. Open the sample scene, press Play, say the wake word, then ask a question (e.g. a URP/rendering
	question to exercise the bundled knowledge base, or any general question to exercise web search).

	See the [ondeviceagent-sample](https://github.com/skykim/ondeviceagent-sample) repository for sample
	details.

	## 4. Android on-device LLM (optional)

	To run the LLM fully on-device on Android instead of via Ollama, see [android-llm.md](android-llm.md).
	In short: side-load (or download) a `.litertlm` model; the `llm-release.aar` bridge ships in this
	package.

	## Troubleshooting

	- No response from the agent (desktop): confirm `ollama serve` is running and the configured
	model is pulled. Check the endpoint in the Inspector.
	- Models missing at runtime: confirm the Sentis model provisioning step ran (check the Editor
	console on load / before build) so `StreamingAssets/Model/` is populated; provisioning needs network
	access to Hugging Face.
	- Wake word not triggering: the voice pipeline needs microphone permission; on desktop, confirm
	the OS granted Unity microphone access.