Google Tensor G4 NPU — DGC0 Compiler Toolchains

Two independently-developed reverse-engineering toolchains for producing DGC0 (EdgeTPU bytecode) for the Google Tensor G4 NPU (the darwinn EdgeTPU in Pixel 9-series phones). Both were built to answer one question: how do you compile a model to run on the G4 NPU when the compiler is gated/undocumented?

What DGC0 is: the on-device NPU consumes a compiled bytecode blob whose magic bytes are 44 47 43 30 = "DGC0". The compiler that produces it is what these toolchains drive.

Status — honest version: these are RE toolchains from before official access existed. The official Google Tensor ML SDK (Beta) (https://ai.google.dev/edge/litert) is the sanctioned production path, and its big advantage is fusing the transformer into monolithic DGCs. But it is not strictly a superset of what's here — the Beta has its own hard walls (a hardcoded ~60 s HAL compile deadline, a per-context capacity cap, death on certain odml STABLEHLO composites, INTERNAL crashes on some 4-bit / large-tensor graphs). These toolchains drive the compiler more directly, with fewer wrapper constraints, so they remain useful — and may still compile cases the Beta chokes on, especially with tighter quantization (smaller DGC0s) or reconfigured partitioning to beat the fragmentation wall. Published as active research, not a closed chapter — prefer the official SDK for production, reach for these when it hits a wall.

What is NOT in this repo: none of the vendor's closed compiler binaries are redistributed here — not liblitert_plugin_compiler.so, libLiteRtCompilerPlugin_google_tensor.so, or libedgetpu_tflite_compiler.so. These toolchains call those; they don't contain them. Reference third-party source (the open libedgetpu runtime, the Pixel kernel edgetpu driver) is also linked, not bundled. Only original RE work + docs are here.

1. Cross-Compile Bridge (`cross-compile-bridge/`)

(formerly nicknamed "cracked SDK")

What it does: compiles DGC0 on an x86_64 Linux host (in Docker) and deploys it to the arm64 phone over adb — a cross-compilation bridge, because the phone can't run the x86 compiler natively.

How it works:

The Google Tensor compiler adapter + engine ship publicly on PyPI — pip install ai-edge-litert-nightly drops vendors/google_tensor/compiler/libLiteRtCompilerPlugin_google_tensor.so (an x86_64 adapter) which loads the real engine.
The "experimental access" gate turned out to be an env-var check + a filename dlopen check over that already-public compiler (see cross-compile-bridge/00_TLDR.md — "mostly theater").
The RE work decoded the C ABI of GoogleTensorCompileFlatbuffer (arg layout, 1=success return, *a6=bytecode_count, *a7=error_msg, the config protobuf, the options struct with soc_model="Tensor_G4") so the public compiler can be driven directly from a stub.

To rebuild:

Compiler: pip install ai-edge-litert-nightly (PyPI) — provides the x86_64 Tensor compiler adapter + engine.
Official SDK context: https://ai.google.dev/edge/litert and the Tensor ML SDK page https://ai.google.dev/edge/litert/next/tensor_ml_sdk.
Then: the stub source (docker/stub_compiler.c), Dockerfiles, and docker/reproduce.sh in this folder drive it.

Read: 00_TLDR.md → 02_C_ABI.md → 03_STUB_BUILD.md → 05_WALL_BROKEN.md.

2. On-Device Compiler Driver (`on-device-compiler-driver/`)

(formerly nicknamed "probe6" / compiler_probe6)

What it does: produces DGC0 natively on the phone — no AICore, no NNAPI, no edgetpu_app_service — by driving the vendor's OWN on-device compiler through its public C entry point.

How it works:

dlopen("/vendor/lib64/libedgetpu_tflite_compiler.so", RTLD_NOW|RTLD_GLOBAL) — the device's own compiler; constructors run and populate the filewrapper TOC.
Resolve and call CompileTfliteFlatbuffer2 (the V2 C ABI — 8 args, AArch64 AAPCS x0..x7). V1 is a buggy 20-byte shim that zeroes the new slot-4 arg.
Output → a DGC0 blob (proven: a 64 MB DGC written on-device, status: OK).

To rebuild:

The compiler itself is the device's /vendor/lib64/libedgetpu_tflite_compiler.so — pull it from a rooted Tensor G4 (Pixel 9-series) device via adb. It is NOT redistributed here.
Build the driver (compiler_probe6.c) for arm64 with the Android NDK, then run on-device.
Reference source for understanding the runtime + DGC0 format (for study, not required to run):
- Open EdgeTPU runtime: https://github.com/google-coral/libedgetpu
- Pixel/Tensor kernel edgetpu driver: ships in the Pixel kernel source (drivers/edgetpu).
LiteRT-LM (the LLM runtime that consumes these): https://github.com/google-ai-edge/litert-lm

Read: 00_NAMING_On-Device-Compiler-Driver.md → BREAKTHROUGH_PROBE6.md → BEST_ARCHITECTURE.md. The driver source is compiler_probe6.c (earlier iterations compiler_probe.c … _probe5.c); the DGC0 FlatBuffer parser is dgc0_parse.{cc,h}.

License

Original RE code, drivers, and documentation in this repo: Apache-2.0. Third-party source referenced above keeps its own upstream license (Apache-2.0 for libedgetpu, GPL-2.0 for the kernel driver) and is intentionally not bundled here — follow the links. No vendor closed binaries are included.

Credits

Reverse-engineering + toolchains by xThr45hx (AI-assisted). Published as a technical record; use responsibly and prefer the official Tensor ML SDK for production.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Google Tensor G4 NPU — DGC0 Compiler Toolchains

1. Cross-Compile Bridge (cross-compile-bridge/)

2. On-Device Compiler Driver (on-device-compiler-driver/)

License

Credits

1. Cross-Compile Bridge (`cross-compile-bridge/`)

2. On-Device Compiler Driver (`on-device-compiler-driver/`)