How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Hal0ai/Qwen-AgentWorld-Hal0-35B-A3B-ROCmFP4",
	filename="Qwen-AgentWorld-35B-A3B-ROCmFP4-STRIX_LEAN.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Qwen-AgentWorld-Hal0-35B-A3B-ROCmFP4

A ROCmFP4 (Q4_0_ROCMFP4_STRIX_LEAN, ~4.29 bpw) GGUF quant of Qwen/Qwen-AgentWorld-35B-A3B — a 35B-A3B Mixture-of-Experts (qwen35moe) world-model, 262K context.

Built for AMD Strix Halo (gfx1151) unified-memory inference on the hal0 agent platform.

⚠️ Requires the ROCmFP4 llama.cpp fork — not upstream

ROCmFP4 is an experimental, fork-specific quantization format (UE4M3-scale FP4). It is not loadable by stock llama.cpp or standard GGUF tooling — those will fail with 101 is not a valid GGMLQuantizationType.

Run it with either:

  • the hal0 Strix Halo toolbox image ghcr.io/hal0ai/amd-strix-halo-toolboxes:rocm-7.2.4-rocmfp4-server, or
  • a build of the rocmfp4-llama fork (branch mtp-rocmfp4-strix), targeting gfx1151.

Details

Architecture qwen35moe (35B total / ~3B active, MoE)
Quant Q4_0_ROCMFP4_STRIX_LEAN (~4.38 bpw target; 4.29 bpw measured)
Recipe ROCmFP4 experts/FFN + Strix K/V + Q5_K token embeddings
File size ~17.3 GiB (from 66 GiB BF16)
MTP No (plain quant, no speculative-decode draft head)
Context 262144
Base Qwen/Qwen-AgentWorld-35B-A3B (Apache-2.0)

Provenance

Quantized from the BF16 GGUF in unsloth/Qwen-AgentWorld-35B-A3B-GGUF, using that repo's importance matrix (imatrix_unsloth.gguf), via the hal0 ROCmFP4 quantize pipeline (llama-quantize --imatrix ... Q4_0_ROCMFP4_STRIX_LEAN).

Downloads last month
-
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Hal0ai/Qwen-AgentWorld-Hal0-35B-A3B-ROCmFP4

Quantized
(64)
this model