Update README.md
Browse files
README.md
CHANGED
|
@@ -137,7 +137,6 @@ pip install torch>=2.1.0 transformers>=4.40.0 accelerate compressed-tensors
|
|
| 137 |
| **Base Model** | [microsoft/NextCoder-7B](https://huggingface.co/microsoft/NextCoder-7B) |
|
| 138 |
| **Quantization Method** | FP8 E4M3 weight-only |
|
| 139 |
| **Framework** | llm-compressor + compressed_tensors |
|
| 140 |
-
| **Calibration Samples** | 2048 (8x industry standard) |
|
| 141 |
| **Storage Size** | ~7GB (3 sharded safetensors) |
|
| 142 |
| **VRAM (vLLM)** | ~7GB |
|
| 143 |
| **VRAM (Transformers)** | ~14GB (decompressed to BF16) |
|
|
@@ -177,12 +176,6 @@ This model is sharded into 3 safetensors files (all required for inference):
|
|
| 177 |
- `model-00002-of-00003.safetensors`
|
| 178 |
- `model-00003-of-00003.safetensors`
|
| 179 |
|
| 180 |
-
## 🔬 Quality Assurance
|
| 181 |
-
|
| 182 |
-
- **High-quality calibration:** 2048 diverse code samples (8x industry standard of 256)
|
| 183 |
-
- **Validation:** Tested on code generation benchmarks
|
| 184 |
-
- **Format:** Standard compressed_tensors for broad compatibility
|
| 185 |
-
|
| 186 |
## 📚 Original Model
|
| 187 |
|
| 188 |
This quantization is based on [microsoft/NextCoder-7B](https://huggingface.co/microsoft/NextCoder-7B) by Microsoft.
|
|
|
|
| 137 |
| **Base Model** | [microsoft/NextCoder-7B](https://huggingface.co/microsoft/NextCoder-7B) |
|
| 138 |
| **Quantization Method** | FP8 E4M3 weight-only |
|
| 139 |
| **Framework** | llm-compressor + compressed_tensors |
|
|
|
|
| 140 |
| **Storage Size** | ~7GB (3 sharded safetensors) |
|
| 141 |
| **VRAM (vLLM)** | ~7GB |
|
| 142 |
| **VRAM (Transformers)** | ~14GB (decompressed to BF16) |
|
|
|
|
| 176 |
- `model-00002-of-00003.safetensors`
|
| 177 |
- `model-00003-of-00003.safetensors`
|
| 178 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 179 |
## 📚 Original Model
|
| 180 |
|
| 181 |
This quantization is based on [microsoft/NextCoder-7B](https://huggingface.co/microsoft/NextCoder-7B) by Microsoft.
|