File size: 5,009 Bytes
60a40fd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
be498d8
60a40fd
 
 
 
 
 
 
 
 
bfd7964
60a40fd
 
 
 
ef9d6c1
60a40fd
ef9d6c1
60a40fd
 
 
 
 
 
 
 
 
 
 
99a1cbf
 
 
60a40fd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
license: other
license_name: prism-research
license_link: LICENSE.md
language:
- en
- zh
tags:
- stepfun
- prism
- moe
- reasoning
- coding
- agentic
- abliterated
pipeline_tag: text-generation
library_name: transformers
base_model:
- stepfun-ai/Step-3.5-Flash
base_model_relation: finetune
---

[![Parameters](https://img.shields.io/badge/Parameters-196B_(11B_Active)-blue)]()
[![Architecture](https://img.shields.io/badge/Architecture-MoE-green)]()
[![Context](https://img.shields.io/badge/Context-256K-orange)]()
[![MTP](https://img.shields.io/badge/MTP--3-350_tok%2Fs_Peak-purple)]()


<p align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/63adf1fa42fd3b8dbaeb0c92/NkmQvQUXzckiRb8U__203.png" width="400"/>
</p>

# Step-3.5-Flash-PRISM

A "role-play" following unrestricted/unchained PRISM-LITE version of [StepFun's Step 3.5 Flash](https://huggingface.co/stepfun-ai/Step-3.5-Flash) intended particularly for over-refusal and propaganda mechanisms suppression using our SOTA PRISM pipeline. 

For Full Custom Production PRISM versions & tensors reach out.
<div align="center">

### ☕ Support Our Work

If you enjoy our work and find it useful, please consider sponsoring or supporting us!

[![Ko-fi](https://img.shields.io/badge/Ko--fi-Support%20Us-ff5e5b?logo=ko-fi&logoColor=white)](https://ko-fi.com/ericelbaz)

| Option | Description |
|--------|-------------|
| [**PRISM VIP Membership**](https://ko-fi.com/summary/6bae206c-a751-4868-8dc7-f531afd1fb4c) | Access to all PRISM models |
| **Bitcoin** | `bc1qarq2pyn4psjpcxzp2ghgwaq6y2h4e53q232x8r` |

![image](https://cdn-uploads.huggingface.co/production/uploads/63adf1fa42fd3b8dbaeb0c92/Psgbl1TgyDok__C7AMQog.png)

</div>

---

## Model Highlights

- **PRISM Ablation** — State-of-the-art technique that removes over-refusal behaviors while preserving model capabilities
- **196B MoE Architecture** — 196 billion total parameters with only 11 billion active per token across 288 fine-grained routed experts + 1 shared expert
- **Multi-Token Prediction (MTP-3)** — Predicts 4 tokens simultaneously, achieving 100–300 tok/s typical throughput (peaking at 350 tok/s)
- **256K Context Window** — Cost-efficient long context via 3:1 Sliding Window Attention (SWA) ratio
- **Frontier Reasoning & Coding** — 97.3 on AIME 2025, 74.4% on SWE-bench Verified, 51.0% on Terminal-Bench 2.0
- **Accessible Local Deployment** — Runs on high-end consumer hardware (Mac Studio M4 Max, NVIDIA DGX Spark)

## Model Architecture

| Specification | Value |
|---------------|-------|
| Architecture | Sparse Mixture-of-Experts (MoE) |
| Backbone | 45-layer Transformer (4,096 hidden dim) |
| Total Parameters | 196.81B (196B Backbone + 0.81B Head) |
| Activated Parameters | ~11B (per token) |
| Routed Experts per Layer | 288 |
| Shared Experts | 1 (always active) |
| Selected Experts per Token | Top-8 |
| Vocabulary Size | 128,896 |
| Context Length | 256K |
| Attention | Hybrid SWA (3:1 SWA-to-Full ratio) |
| MTP Head | Sliding-window attention + dense FFN (4 tokens/pass) |

## Benchmarks

| Benchmark | Step 3.5 Flash | DeepSeek V3.2 | Kimi K2.5 | GLM-4.7 | MiniMax M2.1 |
|-----------|---------------|---------------|-----------|---------|--------------|
| **Agent** | | | | | |
| τ²-Bench | 88.2 | 80.3 | 85.4 | 87.4 | 86.6 |
| BrowseComp | 51.6 | 51.4 | 60.6 | 52.0 | 47.4 |
| GAIA (no file) | 84.5 | 75.1 | 75.9 | 61.9 | 64.3 |
| xbench-DeepSearch (2025.05) | 83.7 | 78.0 | 76.7 | 72.0 | 68.7 |
| **Reasoning** | | | | | |
| AIME 2025 | 97.3 | 93.1 | 96.1 | 95.7 | 83.0 |
| HMMT 2025 (Feb.) | 98.4 | 92.5 | 95.4 | 97.1 | 71.0 |
| IMOAnswerBench | 85.4 | 78.3 | 81.8 | 82.0 | 60.4 |
| **Coding** | | | | | |
| LiveCodeBench-V6 | 86.4 | 83.3 | 85.0 | 84.9 | — |
| SWE-bench Verified | 74.4 | 73.1 | 76.8 | 73.8 | 74.0 |
| Terminal-Bench 2.0 | 51.0 | 46.4 | 50.8 | 41.0 | 47.9 |


### llama.cpp (GGUF)

For local deployment (requires ~120 GB VRAM for int4, smaller quants are available):

```bash
./llama-cli -m step3.5_flash_prism_Q4_K_S.gguf --jinja
```

## Recommended Parameters

| Use Case | Temperature | Top-P | Max New Tokens |
|----------|-------------|-------|----------------|
| Reasoning / Coding | 1.0 | 0.95 | 32768 |
| General Chat | 0.6 | 0.95 | 4096 |

## Hardware Requirements

| Setup | Details |
|-------|---------|
| **BF16 (Full)** | 8x H100/A100 80GB with tensor parallelism |
| **FP8 Quantized** | 8x A100 80GB with expert parallelism |
| **GGUF INT4 (Local)** | ~120 GB unified memory (Mac Studio M4 Max 128GB, DGX Spark, AMD Ryzen AI Max+ 395) |

## License

This model is released under the [PRISM Research License](LICENSE.md).

## Acknowledgments

Based on [Step 3.5 Flash](https://huggingface.co/stepfun-ai/Step-3.5-Flash) by [StepFun AI](https://www.stepfun.com). See the [technical report](https://github.com/stepfun-ai/Step-3.5-Flash/blob/main/step_3p5_flash_tech_report.pdf) and [blog post](https://static.stepfun.com/blog/step-3.5-flash/) for more details on the base model.