Strix‑Halo build of GLM‑5.2‑MXFP4 and simple benchmarks

#1
by neoOpus - opened

Hello, I appreciate you making GLM 5.2 MXFP4 available. Would it be possible for you to share a build or instructions that are specific to Strix Halo, along with some benchmark numbers so that others can replicate the results? I would like to run this on Strix Halo. Since most of us use this machine as a workstation in addition to an inference hardware, a version that can also run in Windows or WSL would be greatly appreciated.

Some of the things that would be most helpful

Install or build: pre-made artifacts or a brief script that demonstrates the precise Strix-Halo conversion and packaging procedures.

Runtime specifics: versions of the operating system, drivers, runtime, compiler, and libraries used.

Quantization: the precise flags or commands you used and the quant modes you tested.

Notes on compatibility: any necessary model modifications, operator adjustments, or unique runtime flags.

Benchmarks for prolonged runs, if at all possible, include power draw or thermal notes.

Precise commands: any environment variables in addition to the commands you used for loading, warmup, and timed runs.

NOTE: Doing so will simplify my decision to purchase more of this machine, and I’m sure I’m not the only one.

Sign up or log in to comment