Update README.md
Browse files
README.md
CHANGED
|
@@ -35,6 +35,8 @@ base_model:
|
|
| 35 |
|
| 36 |
- **Accessible Local Deployment**: Optimized for accessibility, Step 3.5 Flash brings elite-level intelligence to local environments. It runs securely on high-end consumer hardware (e.g., Mac Studio M4 Max, NVIDIA DGX Spark), ensuring data privacy without sacrificing performance.
|
| 37 |
|
|
|
|
|
|
|
| 38 |
## 3. Performance
|
| 39 |
|
| 40 |
Step 3.5 Flash delivers performance parity with leading closed-source systems while remaining open and efficient.
|
|
|
|
| 35 |
|
| 36 |
- **Accessible Local Deployment**: Optimized for accessibility, Step 3.5 Flash brings elite-level intelligence to local environments. It runs securely on high-end consumer hardware (e.g., Mac Studio M4 Max, NVIDIA DGX Spark), ensuring data privacy without sacrificing performance.
|
| 37 |
|
| 38 |
+
As the local deployment of large language models (LLMs) becomes increasingly prevalent, we have successfully adapted the Step 3.5 Flash to NVIDIA DGX Spark 128GB device based on the edge-side inference engine llama.cpp, and simultaneously released the INT4 quantized model weights in GGUF format. On NVIDIA DGX Spark, the Step 3.5 Flash achieves a generation speed of 20 tokens per second; by integrating the INT8 quantization technology for KVCache, it supports an extended context window of up to 256K tokens, thus delivering long text processing capabilities on par with cloud-based inference. The new model can be tested by developers on NVIDIA accelerated infrastructure via build.nvidia.com
|
| 39 |
+
|
| 40 |
## 3. Performance
|
| 41 |
|
| 42 |
Step 3.5 Flash delivers performance parity with leading closed-source systems while remaining open and efficient.
|