OnDeviceAgent Framework

A Unity package for building on-device voice and vision agents: wake-word detection, speech-to-text, a tool-calling LLM, retrieval-augmented generation, on-device vision, and text-to-speech wired into a single local agent runtime.

Install

This package depends on six com.sky.sentis.* model packages. They are not published to a UPM registry, so Unity does not resolve them transitively from a git URL. Add all seven git URLs to your project's Packages/manifest.json (this package plus the six Sentis packages):

"dependencies": {
  "com.sky.ondeviceagent": "https://huggingface.co/Sky-Kim/com.sky.ondeviceagent.git",
  "com.sky.sentis.e5": "https://huggingface.co/Sky-Kim/com.sky.sentis.e5.git",
  "com.sky.sentis.whisper": "https://huggingface.co/Sky-Kim/com.sky.sentis.whisper.git",
  "com.sky.sentis.openwakeword": "https://huggingface.co/Sky-Kim/com.sky.sentis.openwakeword.git",
  "com.sky.sentis.silero-vad": "https://huggingface.co/Sky-Kim/com.sky.sentis.silero-vad.git",
  "com.sky.sentis.supertonic": "https://huggingface.co/Sky-Kim/com.sky.sentis.supertonic.git",
  "com.sky.sentis.yolox": "https://huggingface.co/Sky-Kim/com.sky.sentis.yolox.git"
}

The package binaries and model weights are stored in Git LFS, so git-lfs must be installed on your machine for the Package Manager to fetch them.

This package also depends on the External Dependency Manager, served from the OpenUPM scoped registry. Add it to Packages/manifest.json if it is not already present:

"scopedRegistries": [
  { "name": "package.openupm.com", "url": "https://package.openupm.com",
    "scopes": ["com.google.external-dependency-manager"] }
]

Models

This framework uses two kinds of on-device models:

Sentis models (wake-word, VAD, speech-to-text, text-to-speech, vision, and the RAG text embedder) ship as separate embedded UPM packages — com.sky.sentis.* (e5, whisper, openwakeword, silero-vad, supertonic, yolox) — with the FP16 weights under each package's Models~/ folder. The framework loads them straight from those packages in the Editor; before a player build an Editor step copies them into StreamingAssets/Model/ so the player ships them. All six are declared as hard dependencies of this package but must be added to your project yourself (see Install) — they do not resolve automatically from a git URL.
On-device LLM (Android): the tool-calling LLM weights (LiteRT-LM .litertlm) are streamed from Hugging Face on first launch — the model is gated, so provide a Hugging Face access token — and cached in the app's persistent data path.

Retrieval-augmented generation (LightRAG.NET)

The RAG pipeline is powered by LightRAG.NET, a C# port of LightRAG (graph + vector retrieval). It ships here as a prebuilt managed plugin (LightRAG.NET.dll, netstandard2.1) under Runtime/Plugins/LightRAG/, built from the v0.2.0 release. The framework drives it with the on-device E5 text embedder (com.sky.sentis.e5) and the tool-calling LLM, and talks to Ollama over raw HTTP (the LightRAG.Providers.Ollama provider is intentionally not bundled). To update it, drop a newer LightRAG.NET.dll from the release page into that folder.

Android on-device LLM

On Android the tool-calling LLM runs on-device through a thin Kotlin/JNI bridge over the LiteRT-LM runtime (AndroidLlmTransport, called via JNI at runtime). The bridge ships entirely within this package:

AAR: Runtime/Plugins/Android/llm-release.aar (class com.ondeviceagent.llm.LlmBridge)
EDM4U deps: Runtime/Plugins/Android/Editor/LiteRtDependencies.xml (LiteRT-LM + Qualcomm QNN for NPU)
Kotlin source + rebuild tooling: AndroidBridge~/ (hidden from the importer by ~; rebuild with JDK 17 via ./gradlew assembleRelease and drop the output at Runtime/Plugins/Android/)

Run Assets ▸ External Dependency Manager ▸ Android Resolver ▸ Resolve to fetch the Maven deps.

⚠️ Redistribution note. The AAR embeds libLiteRtDispatch_Qualcomm.so, and the EDM4U manifest pulls Qualcomm QNN Maven artifacts — both proprietary. They are included here for convenience so the NPU backend works out of the box, but redistribution rights are unconfirmed: if you redistribute this package (or a player build that includes it), confirm Qualcomm's redistribution terms for those binaries first, or strip the NPU .so and QNN deps and ship GPU/CPU only.

Sample

For a runnable project that wires the full pipeline into a scene, see the ondeviceagent-sample repository.

License

Apache-2.0. Bundled third-party libraries and downloaded model weights carry their own licenses; see THIRD_PARTY_NOTICES.md in the repository root.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support