Step-3.5-Flash-GGUF-imatrix

This repo contains GGUF weights for stepfun-ai/Step-3.5-Flash with imatrix. Tested on strix halo.

This imatrix version Q4_K_S is ~7GB smaller than Q4_K_M but stil PPL test is slightly better:

104G step-3.5-flash-q4_k_s.gguf # THIS MODEL
Final estimate: PPL = 2.4130 +/- 0.01081
111G step35-q4_k_m.gguf # BASE QUANT WITHOUT IMATRIX
Final estimate: PPL = 2.4177 +/- 0.01091

Quantization Details

  • Method: llama-quantize
  • Llama.cpp Version: 7966 (8872ad212)
  • Original Model Precision: BF16
  • imatrix with wikitext-103-raw-v1

Files Provided

File Quant Method Size Description
step-3.5-flash-q4_k_s-0000{1..3}-of-00003.gguf Q4_K_S 104 GB High quality, with imatrix great for strix-halo

Usage

You can use these models with llama.cpp

./llama-server -m step-3.5-flash-q4_k_s-00001-of-00003.gguf -no-mmap -ngl 99 --port 8080 -c 0 -fa 1 --jinja
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mixer3d/Step-3.5-Flash-GGUF-imatrix

Finetuned
(4)
this model