com.sky.ondeviceagent / docs /getting-started.md
Sky-Kim's picture
Initial commit
2e7837a
|
Raw
History Blame Contribute Delete
4.29 kB

Getting started

1. Add the package and let models provision

Add com.sky.ondeviceagent to your project (Package Manager → Add package from git URL, or reference a local clone under Packages/). The six com.sky.sentis.* model packages are declared as hard dependencies and are pulled in with it.

On-device models are not vendored in this repo — they are fetched on demand:

  • Sentis models (wake-word, VAD, Whisper STT, E5 text embeddings, Supertonic TTS, YOLOX vision) download from Hugging Face (Sky-Kim/com.sky.sentis.*) into each model package's Models~/ folder on first Editor load (and again before a player build). An Editor step then copies them into StreamingAssets/Model/ so the player ships them. No manual download step.
  • On-device LLM (Android) streams from Hugging Face on first launch; see android-llm.md.

See ../THIRD_PARTY_NOTICES.md for each model's source and license.

YOLO detector model (YOLOX, Apache-2.0)

The vision detector uses YOLOX (Apache-2.0) — do not use Ultralytics weights (yolo26n, AGPL-3.0). The weights ship in the com.sky.sentis.yolox package (yolox_fp16.sentis, provisioned from Hugging Face like the other Sentis models); CocoYoloDetector loads it at runtime.

The decoder expects a single output [1, N, 5+C] (or transposed) of [cx, cy, w, h, obj, classes...] in input-pixel coordinates, and RGB NCHW input in [0,1]. NMS runs on the C# side.

Sample knowledge index (optional, for RAG)

The Voice Assistant sample ships a pre-built LightRAG index in its StreamingAssets/VoiceAgent/DB/, so retrieval works out of the box. To rebuild it from the synthetic corpus in StreamingAssets/Knowledge/, use the KnowledgeIngest Editor tool (menu added by Runtime/AgentCore/Editor/KnowledgeIngestMenu.cs). Rebuilding requires a running Ollama for the ingest model. Without an index the agent still runs; only knowledge retrieval is unavailable.

2. Install the desktop LLM (Ollama)

The Editor/desktop path talks to an Ollama server.

# install Ollama (see ollama.com), then:
ollama pull gemma4:e2b      # match the model configured on the agent
ollama serve                # default endpoint http://localhost:11434

Note: gemma4:e2b is the tag configured in code (AgentBuilder.Model / the agent's m_Model), not a public Ollama registry tag — the real Gemma tags are gemma / gemma2 / gemma3 / gemma3n. Running ollama pull gemma4:e2b as-is will fail with "model not found" unless you have a matching local model tagged that way. Either tag a local model as gemma4:e2b, or change m_Model on the agent component in the Inspector to a tag you have pulled.

The default endpoint and model are configurable on the agent component in the Inspector.

3. Run the sample

  1. Use Unity 6000.4.8f1 (Unity 6.x). Other versions are untested.
  2. Open the ondeviceagent-sample project, which references this package and wires the full pipeline into a scene.
  3. Open the sample scene, press Play, say the wake word, then ask a question (e.g. a URP/rendering question to exercise the bundled knowledge base, or any general question to exercise web search).

See the ondeviceagent-sample repository for sample details.

4. Android on-device LLM (optional)

To run the LLM fully on-device on Android instead of via Ollama, see android-llm.md. In short: side-load (or download) a .litertlm model; the llm-release.aar bridge ships in this package.

Troubleshooting

  • No response from the agent (desktop): confirm ollama serve is running and the configured model is pulled. Check the endpoint in the Inspector.
  • Models missing at runtime: confirm the Sentis model provisioning step ran (check the Editor console on load / before build) so StreamingAssets/Model/ is populated; provisioning needs network access to Hugging Face.
  • Wake word not triggering: the voice pipeline needs microphone permission; on desktop, confirm the OS granted Unity microphone access.