Google Tensor G4 NPU β€” DGC0 Compiler Toolchains

Two independently-developed reverse-engineering toolchains for producing DGC0 (EdgeTPU bytecode) for the Google Tensor G4 NPU (the darwinn EdgeTPU in Pixel 9-series phones). Both were built to answer one question: how do you compile a model to run on the G4 NPU when the compiler is gated/undocumented?

What DGC0 is: the on-device NPU consumes a compiled bytecode blob whose magic bytes are 44 47 43 30 = "DGC0". The compiler that produces it is what these toolchains drive.

Status β€” honest version: these are RE toolchains from before official access existed. The official Google Tensor ML SDK (Beta) (https://ai.google.dev/edge/litert) is the sanctioned production path, and its big advantage is fusing the transformer into monolithic DGCs. But it is not strictly a superset of what's here β€” the Beta has its own hard walls (a hardcoded ~60 s HAL compile deadline, a per-context capacity cap, death on certain odml STABLEHLO composites, INTERNAL crashes on some 4-bit / large-tensor graphs). These toolchains drive the compiler more directly, with fewer wrapper constraints, so they remain useful β€” and may still compile cases the Beta chokes on, especially with tighter quantization (smaller DGC0s) or reconfigured partitioning to beat the fragmentation wall. Published as active research, not a closed chapter β€” prefer the official SDK for production, reach for these when it hits a wall.

What is NOT in this repo: none of the vendor's closed compiler binaries are redistributed here β€” not liblitert_plugin_compiler.so, libLiteRtCompilerPlugin_google_tensor.so, or libedgetpu_tflite_compiler.so. These toolchains call those; they don't contain them. Reference third-party source (the open libedgetpu runtime, the Pixel kernel edgetpu driver) is also linked, not bundled. Only original RE work + docs are here.


1. Cross-Compile Bridge (cross-compile-bridge/)

(formerly nicknamed "cracked SDK")

What it does: compiles DGC0 on an x86_64 Linux host (in Docker) and deploys it to the arm64 phone over adb β€” a cross-compilation bridge, because the phone can't run the x86 compiler natively.

How it works:

  1. The Google Tensor compiler adapter + engine ship publicly on PyPI β€” pip install ai-edge-litert-nightly drops vendors/google_tensor/compiler/libLiteRtCompilerPlugin_google_tensor.so (an x86_64 adapter) which loads the real engine.
  2. The "experimental access" gate turned out to be an env-var check + a filename dlopen check over that already-public compiler (see cross-compile-bridge/00_TLDR.md β€” "mostly theater").
  3. The RE work decoded the C ABI of GoogleTensorCompileFlatbuffer (arg layout, 1=success return, *a6=bytecode_count, *a7=error_msg, the config protobuf, the options struct with soc_model="Tensor_G4") so the public compiler can be driven directly from a stub.

To rebuild:

Read: 00_TLDR.md β†’ 02_C_ABI.md β†’ 03_STUB_BUILD.md β†’ 05_WALL_BROKEN.md.


2. On-Device Compiler Driver (on-device-compiler-driver/)

(formerly nicknamed "probe6" / compiler_probe6)

What it does: produces DGC0 natively on the phone β€” no AICore, no NNAPI, no edgetpu_app_service β€” by driving the vendor's OWN on-device compiler through its public C entry point.

How it works:

  1. dlopen("/vendor/lib64/libedgetpu_tflite_compiler.so", RTLD_NOW|RTLD_GLOBAL) β€” the device's own compiler; constructors run and populate the filewrapper TOC.
  2. Resolve and call CompileTfliteFlatbuffer2 (the V2 C ABI β€” 8 args, AArch64 AAPCS x0..x7). V1 is a buggy 20-byte shim that zeroes the new slot-4 arg.
  3. Output β†’ a DGC0 blob (proven: a 64 MB DGC written on-device, status: OK).

To rebuild:

  • The compiler itself is the device's /vendor/lib64/libedgetpu_tflite_compiler.so β€” pull it from a rooted Tensor G4 (Pixel 9-series) device via adb. It is NOT redistributed here.
  • Build the driver (compiler_probe6.c) for arm64 with the Android NDK, then run on-device.
  • Reference source for understanding the runtime + DGC0 format (for study, not required to run):
  • LiteRT-LM (the LLM runtime that consumes these): https://github.com/google-ai-edge/litert-lm

Read: 00_NAMING_On-Device-Compiler-Driver.md β†’ BREAKTHROUGH_PROBE6.md β†’ BEST_ARCHITECTURE.md. The driver source is compiler_probe6.c (earlier iterations compiler_probe.c … _probe5.c); the DGC0 FlatBuffer parser is dgc0_parse.{cc,h}.


License

Original RE code, drivers, and documentation in this repo: Apache-2.0. Third-party source referenced above keeps its own upstream license (Apache-2.0 for libedgetpu, GPL-2.0 for the kernel driver) and is intentionally not bundled here β€” follow the links. No vendor closed binaries are included.

Credits

Reverse-engineering + toolchains by xThr45hx (AI-assisted). Published as a technical record; use responsibly and prefer the official Tensor ML SDK for production.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support