TevunahAi
/

NextCoder-32B-FP8

Text Generation

text-generation-inference

compressed-tensors

Model card Files Files and versions

rockylynnstein commited on Nov 23, 2025

Commit

7863054

·

verified ·

1 Parent(s): 32fafeb

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -31,6 +31,14 @@ FP8 (8-bit floating point) quantization of NextCoder-32B, optimized for fast cod
 | Quantization Time | 213.8 minutes |
 | Hardware Used | NVIDIA RTX 5000 Ada Generation (31.5 GB) |
 ## Usage
 ### Loading the Model

 | Quantization Time | 213.8 minutes |
 | Hardware Used | NVIDIA RTX 5000 Ada Generation (31.5 GB) |
+#### Quantization Infrastructure
+Quantized on professional hardware to ensure quality and reliability:
+- **CPUs:** Dual Intel Xeon Max 9480 (224 threads, 128GB HBM2e)
+- **GPU:** NVIDIA RTX 5000 Ada Generation (32GB VRAM) with native FP8 support
+- **Memory:** 256GB DDR5 + 128GB HBM2e = 384GB total
+- **Software:** Ubuntu 25.10 | Python 3.12 | PyTorch 2.8 | CUDA 13 | llm-compressor
 ## Usage
 ### Loading the Model