Strix‑Halo build of GLM‑5.2‑MXFP4 and simple benchmarks

by neoOpus - opened 13 days ago

Hello, I appreciate you making GLM 5.2 MXFP4 available. Would it be possible for you to share a build or instructions that are specific to Strix Halo, along with some benchmark numbers so that others can replicate the results? I would like to run this on Strix Halo. Since most of us use this machine as a workstation in addition to an inference hardware, a version that can also run in Windows or WSL would be greatly appreciated.

Some of the things that would be most helpful

Install or build: pre-made artifacts or a brief script that demonstrates the precise Strix-Halo conversion and packaging procedures.

Runtime specifics: versions of the operating system, drivers, runtime, compiler, and libraries used.

Quantization: the precise flags or commands you used and the quant modes you tested.

Notes on compatibility: any necessary model modifications, operator adjustments, or unique runtime flags.

Benchmarks for prolonged runs, if at all possible, include power draw or thermal notes.

Precise commands: any environment variables in addition to the commands you used for loading, warmup, and timed runs.

NOTE: Doing so will simplify my decision to purchase more of this machine, and I’m sure I’m not the only one.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment