--- language: - en - zh license: apache-2.0 base_model: stepfun-ai/Step-3.7-Flash-NVFP4 pipeline_tag: image-text-to-text library_name: mlx tags: - mlx - jang - jang-k - stepfun - vision-language --- # Step-3.7-Flash-JANG_K JANG affine conversion of [stepfun-ai/Step-3.7-Flash-NVFP4](https://huggingface.co/stepfun-ai/Step-3.7-Flash-NVFP4). This JANG_K variant keeps the proven Step JANG text runtime path and uses the routed expert policy: ```text gate_proj / up_proj / down_proj = 4 / 2 / 2 ``` It is the affine K-lane comparison point for the experimental Step `JANGTQ_2K` work. ## Status Verified locally: - 58 safetensors shards - 2,570 indexed tensors - no raw NVFP4 `weight_scale`, `weight_scale_2`, or `input_scale` sidecars in the output index - `jang_config.json` capability verification passes - text generation proof passes through the bundled `step3p7_mlx.py` bridge Text proof: ```json { "prompt": "What is 2+2? Answer with only the number.", "output": "The user is asking \"What is 2+2? Answer with only the number.\" So the answer is 4. The user wants only the number, so I should just output \"4\".\\n\\n4", "prompt_tokens": 26, "generated_tokens": 43, "contains_final_4": true } ``` Warmed decode proof: ```json { "measured_tokens": 32, "decode_s": 0.8008251190185547, "tok_s": 39.95878655656726 } ``` ## Format - Format: JANG affine - Profile: `JANG_K` - Routed expert policy: `gate_proj=4`, `up_proj=2`, `down_proj=2` - Attention, router gates, dense/shared MLP, embeddings, and lm head follow the proven Step JANG_2L runtime policy - Vision/projector tensors are included as F16 passthrough - Audio tensors: none in the source checkpoint - MTP tensors: none in the source checkpoint ## Runtime The bundled `step3p7_mlx.py` bridge maps the nested Step3p7 text config to MLX's Step3p5 text runtime and drops vision tensors for text-only generation. Required text runtime behavior: - load `model_file=step3p7_mlx.py` - preserve the source chat template; it opens the assistant generation prompt inside `` - use normal KV cache with Step full/sliding attention behavior from the Step3p5 MLX runtime - do not add a second synthetic reasoning prefix - use `PreTrainedTokenizerFast`; the source tokenizer metadata otherwise chooses a Llama tokenizer class that decodes byte-level markers incorrectly Full image-input VLM coherence is not claimed by this artifact. The vision weights are present, but image patch expansion and projector routing still need a Step3p7 VLM wrapper in the target runtime. ## Korean 이 번들은 Step-3.7-Flash-NVFP4를 JANG_K affine `4/2/2` 전문가 비트 정책으로 변환한 산출물입니다. 텍스트 경로는 로컬 MLX 생성 검증을 통과했습니다. 비전 가중치는 포함되어 있지만 이미지 입력 경로는 별도 런타임 구현과 검증이 필요합니다.