Instructions to use AMAImedia/CodeRM-GRPO-Selection-8B-NOESIS-AWQ-INT4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AMAImedia/CodeRM-GRPO-Selection-8B-NOESIS-AWQ-INT4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="AMAImedia/CodeRM-GRPO-Selection-8B-NOESIS-AWQ-INT4")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AMAImedia/CodeRM-GRPO-Selection-8B-NOESIS-AWQ-INT4") model = AutoModelForCausalLM.from_pretrained("AMAImedia/CodeRM-GRPO-Selection-8B-NOESIS-AWQ-INT4") - Notebooks
- Google Colab
- Kaggle
CodeRM-GRPO-Selection-8B — NOESIS AWQ-INT4
AWQ-INT4 (GEMM kernel) quantization of CodeRM-GRPO-Selection-8B, a
code-domain reward model trained with GRPO (Group Relative Policy
Optimization) on top of Qwen/Qwen3-8B.
This bundle is the deployment variant for the NOESIS-VC-ONE platform — it
fits a 6 GB GPU (RTX 3060 Laptop) and serves as the M5-CODE branch
best-of-N selector at inference time.
| Field | Value |
|---|---|
| Architecture | Qwen3ForCausalLM (scoring backbone) |
| Hidden size | 4 096 |
| Layers | 36 |
| Attention heads | 32 |
| KV heads | 8 (GQA) |
| Head dim | 128 |
| Vocab | 151 936 (Qwen3 standard) |
| Context length | 32 768 (positional 40 960) |
| Base model | Qwen/Qwen3-8B (Apache 2.0) |
| Fine-tune method | GRPO (Shao et al., DeepSeekMath / DeepSeek-R1 lineage) |
| Quantization | AWQ INT4, GEMM kernel, group_size=128, zero_point=true |
| Bundle size | ~6.1 GB on disk (down from ~16 GB BF16) |
| Runtime VRAM | ~5.5 GB peak (fits RTX 3060 6 GB) |
| Required runtime | transformers >= 5.8.1 with native AwqConfig |
| License | Apache 2.0 (inherited from Qwen3-8B + GRPO fine-tune) |
What's in this bundle
| File | Purpose |
|---|---|
model-00001-of-00002.safetensors (4.0 GB) |
AWQ-quantized weight shard 1/2 |
model-00002-of-00002.safetensors (2.1 GB) |
AWQ-quantized weight shard 2/2 |
model.safetensors.index.json |
shard map (qweight / qzeros / scales per Linear) |
config.json |
quantization_config.quant_method="awq" + AWQ params |
tokenizer.json / tokenizer_config.json |
Qwen3 BPE tokenizer (vocab 151 936) |
chat_template.jinja |
Qwen3 standard chat template |
generation_config.json |
inherited defaults |
noesis_provenance.json |
full NOESIS provenance (see below) |
LICENSE |
Apache 2.0 |
Quantization details (sealed in noesis_provenance.json)
| Parameter | Value |
|---|---|
| Method | AWQ via autoawq |
| Kernel | GEMM |
w_bit |
4 |
q_group_size |
128 |
zero_point |
true |
| Calibration samples | 64 |
Calibration max_seq_len |
384 |
| Calibration source | noesis_router_dataset_50k_curated.jsonl |
| RNG seed | 1729 |
| Wall-clock | 57.13 minutes |
force_arch_override |
null (auto-detected Qwen3ForCausalLM) |
| NOESIS framework | DHCF-FNO v15.7 |
What GRPO-Selection means
Group Relative Policy Optimization scores candidates relative to the group of competing candidates rather than against an absolute return. For code-reward selection that translates to:
- Sample N candidates from a code-generation expert (e.g. M5-CODE).
- Run each candidate through this reward model → group of N scalar scores.
- Pick
argmax(best-of-N) or rank-based selection (top-k).
Group-relative scoring produces sharper preference signals than classic absolute-reward PPO on coding tasks where many candidates are "almost correct" but only a few actually pass tests.
Quick start
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
bundle = "AMAImedia/CodeRM-GRPO-Selection-8B-NOESIS-AWQ-INT4"
tokenizer = AutoTokenizer.from_pretrained(bundle)
model = AutoModelForCausalLM.from_pretrained(
bundle,
device_map={"": 0}, # AWQ kernels prefer single-device load
dtype=torch.float16, # AWQ activations are fp16
).eval()
def score(prompt: str, code: str) -> float:
text = (
f"<|im_start|>user\n{prompt}<|im_end|>\n"
f"<|im_start|>assistant\n{code}<|im_end|>"
)
ids = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
logits = model(**ids).logits[:, -1, :]
return float(logits.softmax(-1).max())
candidates = [
"def solve(n):\n return n + 1",
"def solve(n):\n return n * 2",
]
prompt = "Write a function that returns n+1."
scores = [score(prompt, c) for c in candidates]
print("Best:", candidates[scores.index(max(scores))])
AWQ runtime note.
transformers >= 5.8.1reads thequantization_configblock inconfig.jsonand instantiates the AWQ kernels automatically — noautoawqimport is needed at inference time. The autoawq library is only required if you want to re-quantize from BF16 sources.
NOESIS integration
In the NOESIS-VC-ONE platform CodeRM serves as the code-domain selector inside the M5-CODE branch:
M5-CODE generates N candidates
│
▼
CodeRM-GRPO-Selection-8B-NOESIS-AWQ-INT4 ← (this bundle)
│ group-relative scores
▼
Orchestrator picks argmax
│
▼
QC-4B verifies executability
Apache 2.0 lineage end-to-end keeps this branch commercial-clean.
Sealed NOESIS rules
R-APACHE-CLEAN— Apache 2.0 preserved (Qwen3 base + GRPO fine-tune + AWQ quant).R-REWARD-MODEL-FROZEN— reward model is frozen during inference; no gradient feedback into M5-CODE at production runtime.R-BEST-OF-N-CAP— production best-of-N selection is capped at N=8 to bound VRAM / latency on RTX 3060.R-AWQ-DEVICE-MAP-SINGLE— AWQ kernels requiredevice_map={"":0}, never"auto"(mirrors the NF4 rule and applies for the same reason: kernel expects the full computation graph on one device).
Provenance
| Step | Source / output |
|---|---|
| Base weights | Qwen/Qwen3-8B (© Alibaba / Qwen Team, Apache 2.0) |
| Fine-tune | GRPO on code-reward dataset (author's pipeline) |
| Source format | BF16 native safetensors × 4 shards (~16 GB) |
| Quantization | AWQ INT4 GEMM via autoawq, group_size=128, w_bit=4 |
| Bundle | this repo — AMAImedia/CodeRM-GRPO-Selection-8B-NOESIS-AWQ-INT4 |
| NOESIS slot | M5-CODE branch reward-selection head |
License
Apache License 2.0. Qwen3-8B base © Alibaba Cloud / Qwen Team
(2025-2026). GRPO fine-tune © CodeRM-GRPO-Selection author(s). NOESIS
AWQ-INT4 quantization layer © AMAImedia 2026 (NOESIS DHCF-FNO project).
See LICENSE.
NOESIS DHCF-FNO framework — AMAImedia.com.
BF16 source bundle: AMAImedia/CodeRM-GRPO-Selection-8B (if/when published).
- Downloads last month
- 19