gpt-oss-20b-reap-0.4-mxfp4-gguf

This repository contains a GGUF quantized version of the sandeshrajx/gpt-oss-20b-reap-0.4-mxfp4 model.

Model Description

This model is a GGUF quantized version of the MXFP4 quantized openai/gpt-oss-20b model.

  • Original Model: openai/gpt-oss-20b
  • Pruning Method: reap with a compression ratio of 0.4
  • First Quantization Method: MXFP4 weight-only quantization
  • Second Quantization Method: GGUF (Q8_0) using llama.cpp
  • Dataset used for pruning/quantization (if applicable): theblackcat102/evol-codealpaca-v1

The original MXFP4 quantization specifically targeted the "expert" layers of the model, skipping self-attention and router layers, as is standard practice for Mixture-of-Experts (MoE) models to optimize performance and reduce size. This GGUF quantization further reduces the model size for efficient inference with llama.cpp.

Usage

You can use this model with llama.cpp or compatible GGUF loaders.

Quantization Details

The model was first pruned with a 0.4 compression ratio using reap, then quantized to MXFP4. Subsequently, it was converted to GGUF (Q8_0) format using the llama.cpp conversion script.

Pruning Commands Used:

python ./reap/src/reap/prune.py \
    --model-name "openai/gpt-oss-20b" \
    --run_observer_only true \
    --samples_per_category 32

python ./reap/src/reap/prune.py \
    --model-name "openai/gpt-oss-20b" \
    --compression-ratio 0.4 \
    --prune-method reap

MXFP4 Quantization Command Used:

python Model-Optimizer/examples/gpt-oss/convert_oai_mxfp4_weight_only.py \
    --model_path /workspace/artifacts/gpt-oss-20b/evol-codealpaca-v1/pruned_models/reap-seed_42-0.4-mxfp4 \
    --output_path /workspace/artifacts/gpt-oss-20b/evol-codealpaca-v1/pruned_models/reap-seed_42-0.4-mxfp4-quantized

GGUF Quantization Command Used:

python llama.cpp/convert_hf_to_gguf.py \
    --outtype q8_0 \
    --outfile /path/to/output/gpt-oss-20b-reap-0.4-mxfp4-q8_0.gguf \
    /path/to/downloaded/mxfp4_model

License

(Please specify the license of the original model and any modifications)

Downloads last month
79
GGUF
Model size
14B params
Architecture
gpt-oss
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results