rockylynnstein commited on
Commit
7863054
·
verified ·
1 Parent(s): 32fafeb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -31,6 +31,14 @@ FP8 (8-bit floating point) quantization of NextCoder-32B, optimized for fast cod
31
  | Quantization Time | 213.8 minutes |
32
  | Hardware Used | NVIDIA RTX 5000 Ada Generation (31.5 GB) |
33
 
 
 
 
 
 
 
 
 
34
  ## Usage
35
 
36
  ### Loading the Model
 
31
  | Quantization Time | 213.8 minutes |
32
  | Hardware Used | NVIDIA RTX 5000 Ada Generation (31.5 GB) |
33
 
34
+ #### Quantization Infrastructure
35
+
36
+ Quantized on professional hardware to ensure quality and reliability:
37
+ - **CPUs:** Dual Intel Xeon Max 9480 (224 threads, 128GB HBM2e)
38
+ - **GPU:** NVIDIA RTX 5000 Ada Generation (32GB VRAM) with native FP8 support
39
+ - **Memory:** 256GB DDR5 + 128GB HBM2e = 384GB total
40
+ - **Software:** Ubuntu 25.10 | Python 3.12 | PyTorch 2.8 | CUDA 13 | llm-compressor
41
+
42
  ## Usage
43
 
44
  ### Loading the Model