Step-3.5-Flash-GGUF-imatrix
This repo contains GGUF weights for stepfun-ai/Step-3.5-Flash with imatrix. Tested on strix halo.
This imatrix version Q4_K_S is ~7GB smaller than Q4_K_M but stil PPL test is slightly better:
104G step-3.5-flash-q4_k_s.gguf # THIS MODEL
Final estimate: PPL = 2.4130 +/- 0.01081
111G step35-q4_k_m.gguf # BASE QUANT WITHOUT IMATRIX
Final estimate: PPL = 2.4177 +/- 0.01091
Quantization Details
- Method:
llama-quantize - Llama.cpp Version:
7966 (8872ad212) - Original Model Precision:
BF16 - imatrix with wikitext-103-raw-v1
Files Provided
| File | Quant Method | Size | Description |
|---|---|---|---|
step-3.5-flash-q4_k_s-0000{1..3}-of-00003.gguf |
Q4_K_S | 104 GB | High quality, with imatrix great for strix-halo |
Usage
You can use these models with llama.cpp
./llama-server -m step-3.5-flash-q4_k_s-00001-of-00003.gguf -no-mmap -ngl 99 --port 8080 -c 0 -fa 1 --jinja
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for mixer3d/Step-3.5-Flash-GGUF-imatrix
Base model
stepfun-ai/Step-3.5-Flash