Step-3.5-Flash-GGUF-imatrix

This repo contains GGUF weights for stepfun-ai/Step-3.5-Flash with imatrix. Tested on strix halo.

This imatrix version Q4_K_S is ~7GB smaller than Q4_K_M but stil PPL test is slightly better:

104G step-3.5-flash-q4_k_s.gguf # THIS MODEL
Final estimate: PPL = 2.4130 +/- 0.01081

111G step35-q4_k_m.gguf # BASE QUANT WITHOUT IMATRIX
Final estimate: PPL = 2.4177 +/- 0.01091

Quantization Details

File	Quant Method	Size	Description
`step-3.5-flash-q4_k_s-0000{1..3}-of-00003.gguf`	Q4_K_S	104 GB	High quality, with imatrix great for strix-halo

You can use these models with llama.cpp

./llama-server -m step-3.5-flash-q4_k_s-00001-of-00003.gguf -no-mmap -ngl 99 --port 8080 -c 0 -fa 1 --jinja

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

(4)

this model