Image-Text-to-Text
Transformers
Safetensors
English
step3p7
text-generation
vision-language
multimodal
Mixture of Experts
mxfp4
compressed-tensors
quantized
vllm
conversational
custom_code
8-bit precision
Instructions to use olka-fi/Step-3.7-Flash-MXFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use olka-fi/Step-3.7-Flash-MXFP4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="olka-fi/Step-3.7-Flash-MXFP4", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("olka-fi/Step-3.7-Flash-MXFP4", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use olka-fi/Step-3.7-Flash-MXFP4 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "olka-fi/Step-3.7-Flash-MXFP4" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "olka-fi/Step-3.7-Flash-MXFP4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/olka-fi/Step-3.7-Flash-MXFP4
- SGLang
How to use olka-fi/Step-3.7-Flash-MXFP4 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "olka-fi/Step-3.7-Flash-MXFP4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "olka-fi/Step-3.7-Flash-MXFP4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "olka-fi/Step-3.7-Flash-MXFP4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "olka-fi/Step-3.7-Flash-MXFP4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use olka-fi/Step-3.7-Flash-MXFP4 with Docker Model Runner:
docker model run hf.co/olka-fi/Step-3.7-Flash-MXFP4
| Input format: FP8 | |
| Quant format: MXFP4 | |
| Output format: ct | |
| Shards: 26 | |
| Workers: 8 × 3 threads | |
| Scale percentile: 99.5 | |
| Include patterns: ['moe.gate_proj', 'moe.up_proj', 'moe.down_proj'] | |
| (--exclude_layers ignored) | |
| MSE scale select: enabled (3 candidates per block) | |
| Loading input_layernorm.weight tensors for γ-weighted MSE... | |
| γ found for 48 layers (layers 0-47) | |
| Zero-copy: disabled (FP8 models/usr/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 5 leaked semaphore objects to clean up at shutdown | |
| warnings.warn('resource_tracker: There appear to be %d ' | |
| ith γ) | |
| model-00003.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ) | |
| model-00015.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ) | |
| model-00023.safetensors: 4716MB → 2592MB (3 quantized, 2 with γ) | |
| model-00002.safetensors: 5279MB → 3155MB (3 quantized, 2 with γ) | |
| model-00010.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ) | |
| model-00018.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ) | |
| model-vit-00002.safetensors: 2348MB → 2348MB (0 quantized, 0 with γ) | |
| model-00006.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ) | |
| model-00011.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ) | |
| model-00019.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ) | |
| model-00004.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ) | |
| model-00012.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ) | |
| model-00020.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ) | |
| model-00007.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ) | |
| model-00016.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ) | |
| model-00024.safetensors: 6968MB → 6968MB (0 quantized, 0 with γ) | |
| model-00005.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ) | |
| model-00013.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ) | |
| model-00021.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ) | |
| model-00001.safetensors: 924MB → 924MB (0 quantized, 0 with γ) | |
| model-00009.safetensors: 9500MB → 5251MB (6 quantized, 4 with γ) | |
| model-00014.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ) | |
| model-00022.safetensors: 9567MB → 5319MB (6 quantized, 4 with γ) | |
| [1/26] model-00001.safetensors done (0% | elapsed 2s | ETA 8m46s) | |
| [2/26] model-00002.safetensors done (3% | elapsed 20s | ETA 11m27s) | |
| [3/26] model-00006.safetensors done (7% | elapsed 31s | ETA 6m29s) | |
| [4/26] model-00004.safetensors done (12% | elapsed 33s | ETA 4m09s) | |
| [5/26] model-00005.safetensors done (16% | elapsed 36s | ETA 3m04s) | |
| [6/26] model-00009.safetensors done (21% | elapsed 39s | ETA 2m31s) | |
| [7/26] model-00003.safetensors done (25% | elapsed 41s | ETA 2m01s) | |
| [8/26] model-00007.safetensors done (30% | elapsed 43s | ETA 1m41s) | |
| [9/26] model-00008.safetensors done (34% | elapsed 44s | ETA 1m25s) | |
| [10/26] model-00010.safetensors done (39% | elapsed 47s | ETA 1m14s) | |
| [11/26] model-00011.safetensors done (43% | elapsed 49s | ETA 1m04s) | |
| [12/26] model-00012.safetensors done (48% | elapsed 52s | ETA 0m57s) | |
| [13/26] model-00013.safetensors done (52% | elapsed 55s | ETA 0m50s) | |
| [14/26] model-00014.safetensors done (57% | elapsed 58s | ETA 0m44s) | |
| [15/26] model-00015.safetensors done (61% | elapsed 60s | ETA 0m38s) | |
| [16/26] model-00016.safetensors done (66% | elapsed 63s | ETA 0m33s) | |
| [17/26] model-00017.safetensors done (70% | elapsed 67s | ETA 0m28s) | |
| [18/26] model-00018.safetensors done (75% | elapsed 70s | ETA 0m23s) | |
| [19/26] model-vit-00001.safetensors done (75% | elapsed 72s | ETA 0m23s) | |
| [20/26] model-00023.safetensors done (78% | elapsed 72s | ETA 0m20s) | |
| [21/26] model-vit-00002.safetensors done (79% | elapsed 74s | ETA 0m20s) | |
| [22/26] model-00019.safetensors done (83% | elapsed 75s | ETA 0m15s) | |
| [23/26] model-00020.safetensors done (88% | elapsed 76s | ETA 0m10s) | |
| [24/26] model-00021.safetensors done (92% | elapsed 78s | ETA 0m06s) | |
| [25/26] model-00022.safetensors done (97% | elapsed 82s | ETA 0m02s) | |
| [26/26] model-00024.safetensors done (100% | elapsed 83s | ETA 0m00s) | |
| Copied special_tokens_map.json | |
| Copied .gitattributes | |
| Copied tokenizer.json | |
| Copied vision_encoder.py | |
| Copied tokenizer_config.json | |
| Copied configuration_step3p7.py | |
| Copied README.md | |
| Copied model.safetensors.index.json | |
| Copied chat_template.jinja | |
| Copied config.json | |
| Copied download.log | |
| Copied modeling_step3p7.py | |
| Copied processing_step3.py | |
| Index: 73921 tensors across 26 shards | |
| Done! 212.5GB → 123.3GB (58.0%) | |
| Output: /mnt/storage/stepfun-mxfp4 | |