WinstonDeng commited on
Commit
9e8fafe
·
verified ·
1 Parent(s): 224a8c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -35,6 +35,8 @@ base_model:
35
 
36
  - **Accessible Local Deployment**: Optimized for accessibility, Step 3.5 Flash brings elite-level intelligence to local environments. It runs securely on high-end consumer hardware (e.g., Mac Studio M4 Max, NVIDIA DGX Spark), ensuring data privacy without sacrificing performance.
37
 
 
 
38
  ## 3. Performance
39
 
40
  Step 3.5 Flash delivers performance parity with leading closed-source systems while remaining open and efficient.
 
35
 
36
  - **Accessible Local Deployment**: Optimized for accessibility, Step 3.5 Flash brings elite-level intelligence to local environments. It runs securely on high-end consumer hardware (e.g., Mac Studio M4 Max, NVIDIA DGX Spark), ensuring data privacy without sacrificing performance.
37
 
38
+ As the local deployment of large language models (LLMs) becomes increasingly prevalent, we have successfully adapted the Step 3.5 Flash to NVIDIA DGX Spark 128GB device based on the edge-side inference engine llama.cpp, and simultaneously released the INT4 quantized model weights in GGUF format. On NVIDIA DGX Spark, the Step 3.5 Flash achieves a generation speed of 20 tokens per second; by integrating the INT8 quantization technology for KVCache, it supports an extended context window of up to 256K tokens, thus delivering long text processing capabilities on par with cloud-based inference. The new model can be tested by developers on NVIDIA accelerated infrastructure via build.nvidia.com
39
+
40
  ## 3. Performance
41
 
42
  Step 3.5 Flash delivers performance parity with leading closed-source systems while remaining open and efficient.