Instructions to use olka-fi/Step-3.7-Flash-MXFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use olka-fi/Step-3.7-Flash-MXFP4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="olka-fi/Step-3.7-Flash-MXFP4", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("olka-fi/Step-3.7-Flash-MXFP4", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use olka-fi/Step-3.7-Flash-MXFP4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "olka-fi/Step-3.7-Flash-MXFP4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "olka-fi/Step-3.7-Flash-MXFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/olka-fi/Step-3.7-Flash-MXFP4

SGLang

How to use olka-fi/Step-3.7-Flash-MXFP4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "olka-fi/Step-3.7-Flash-MXFP4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "olka-fi/Step-3.7-Flash-MXFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "olka-fi/Step-3.7-Flash-MXFP4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "olka-fi/Step-3.7-Flash-MXFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use olka-fi/Step-3.7-Flash-MXFP4 with Docker Model Runner:
```
docker model run hf.co/olka-fi/Step-3.7-Flash-MXFP4
```

Step-3.7-Flash-MXFP4 / quantize.log

olka-fi

Add files using upload-large-folder tool

a405d86 verified 6 days ago

Raw

History Blame Contribute Delete

4.53 kB

	Input format: FP8
	Quant format: MXFP4
	Output format: ct
	Shards: 26
	Workers: 8 × 3 threads
	Scale percentile: 99.5
	Include patterns: ['moe.gate_proj', 'moe.up_proj', 'moe.down_proj']
	(--exclude_layers ignored)
	MSE scale select: enabled (3 candidates per block)
	Loading input_layernorm.weight tensors for γ-weighted MSE...
	γ found for 48 layers (layers 0-47)
	Zero-copy: disabled (FP8 models/usr/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 5 leaked semaphore objects to clean up at shutdown
	warnings.warn('resource_tracker: There appear to be %d '
	ith γ)
	model-00003.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ)
	model-00015.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ)
	model-00023.safetensors: 4716MB → 2592MB (3 quantized, 2 with γ)
	model-00002.safetensors: 5279MB → 3155MB (3 quantized, 2 with γ)
	model-00010.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ)
	model-00018.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ)
	model-vit-00002.safetensors: 2348MB → 2348MB (0 quantized, 0 with γ)
	model-00006.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ)
	model-00011.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ)
	model-00019.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ)
	model-00004.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ)
	model-00012.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ)
	model-00020.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ)
	model-00007.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ)
	model-00016.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ)
	model-00024.safetensors: 6968MB → 6968MB (0 quantized, 0 with γ)
	model-00005.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ)
	model-00013.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ)
	model-00021.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ)
	model-00001.safetensors: 924MB → 924MB (0 quantized, 0 with γ)
	model-00009.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ)
	model-00014.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ)
	model-00022.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ)
	[1/26] model-00001.safetensors done (0% \| elapsed 2s \| ETA 8m46s)
	[2/26] model-00002.safetensors done (3% \| elapsed 20s \| ETA 11m27s)
	[3/26] model-00006.safetensors done (7% \| elapsed 31s \| ETA 6m29s)
	[4/26] model-00004.safetensors done (12% \| elapsed 33s \| ETA 4m09s)
	[5/26] model-00005.safetensors done (16% \| elapsed 36s \| ETA 3m04s)
	[6/26] model-00009.safetensors done (21% \| elapsed 39s \| ETA 2m31s)
	[7/26] model-00003.safetensors done (25% \| elapsed 41s \| ETA 2m01s)
	[8/26] model-00007.safetensors done (30% \| elapsed 43s \| ETA 1m41s)
	[9/26] model-00008.safetensors done (34% \| elapsed 44s \| ETA 1m25s)
	[10/26] model-00010.safetensors done (39% \| elapsed 47s \| ETA 1m14s)
	[11/26] model-00011.safetensors done (43% \| elapsed 49s \| ETA 1m04s)
	[12/26] model-00012.safetensors done (48% \| elapsed 52s \| ETA 0m57s)
	[13/26] model-00013.safetensors done (52% \| elapsed 55s \| ETA 0m50s)
	[14/26] model-00014.safetensors done (57% \| elapsed 58s \| ETA 0m44s)
	[15/26] model-00015.safetensors done (61% \| elapsed 60s \| ETA 0m38s)
	[16/26] model-00016.safetensors done (66% \| elapsed 63s \| ETA 0m33s)
	[17/26] model-00017.safetensors done (70% \| elapsed 67s \| ETA 0m28s)
	[18/26] model-00018.safetensors done (75% \| elapsed 70s \| ETA 0m23s)
	[19/26] model-vit-00001.safetensors done (75% \| elapsed 72s \| ETA 0m23s)
	[20/26] model-00023.safetensors done (78% \| elapsed 72s \| ETA 0m20s)
	[21/26] model-vit-00002.safetensors done (79% \| elapsed 74s \| ETA 0m20s)
	[22/26] model-00019.safetensors done (83% \| elapsed 75s \| ETA 0m15s)
	[23/26] model-00020.safetensors done (88% \| elapsed 76s \| ETA 0m10s)
	[24/26] model-00021.safetensors done (92% \| elapsed 78s \| ETA 0m06s)
	[25/26] model-00022.safetensors done (97% \| elapsed 82s \| ETA 0m02s)
	[26/26] model-00024.safetensors done (100% \| elapsed 83s \| ETA 0m00s)
	Copied special_tokens_map.json
	Copied .gitattributes
	Copied tokenizer.json
	Copied vision_encoder.py
	Copied tokenizer_config.json
	Copied configuration_step3p7.py
	Copied README.md
	Copied model.safetensors.index.json
	Copied chat_template.jinja
	Copied config.json
	Copied download.log
	Copied modeling_step3p7.py
	Copied processing_step3.py
	Index: 73921 tensors across 26 shards

	Done! 212.5GB → 123.3GB (58.0%)
	Output: /mnt/storage/stepfun-mxfp4