stepfun-ai
/

Step-3.5-Flash-Int4

Text Generation

Model card Files Files and versions

WinstonDeng commited on 3 days ago

Commit

9e8fafe

·

verified ·

1 Parent(s): 224a8c2

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -35,6 +35,8 @@ base_model:
 - **Accessible Local Deployment**: Optimized for accessibility, Step 3.5 Flash brings elite-level intelligence to local environments. It runs securely on high-end consumer hardware (e.g., Mac Studio M4 Max, NVIDIA DGX Spark), ensuring data privacy without sacrificing performance.
 ## 3. Performance
 Step 3.5 Flash delivers performance parity with leading closed-source systems while remaining open and efficient.

 - **Accessible Local Deployment**: Optimized for accessibility, Step 3.5 Flash brings elite-level intelligence to local environments. It runs securely on high-end consumer hardware (e.g., Mac Studio M4 Max, NVIDIA DGX Spark), ensuring data privacy without sacrificing performance.
+As the local deployment of large language models (LLMs) becomes increasingly prevalent, we have successfully adapted the Step 3.5 Flash to NVIDIA DGX Spark 128GB device based on the edge-side inference engine llama.cpp, and simultaneously released the INT4 quantized model weights in GGUF format. On NVIDIA DGX Spark, the Step 3.5 Flash achieves a generation speed of 20 tokens per second; by integrating the INT8 quantization technology for KVCache, it supports an extended context window of up to 256K tokens, thus delivering long text processing capabilities on par with cloud-based inference. The new model can be tested by developers on NVIDIA accelerated infrastructure via build.nvidia.com
 ## 3. Performance
 Step 3.5 Flash delivers performance parity with leading closed-source systems while remaining open and efficient.