DeepSeek-Coder-V2-Lite-Instruct-NVFP4

NV_FP4 (4-bit floating point) quantized version of deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct.

Model Details

Attribute Value
Base Model deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
Quantization NV_FP4 (E2M1 + block scaling)
Block Size 32
Original Size 31.41 GB
Quantized Size 9.44 GB
Compression 3.33x

NV_FP4 Format

NV_FP4 is a 4-bit floating point format optimized for NVIDIA hardware:

  • Format: 1 sign bit, 2 exponent bits, 1 mantissa bit (E2M1)
  • Representable values: ±{0, 0.5, 1, 1.5, 2, 3, 4, 6}
  • Block-wise scaling: 32 elements per scale factor
  • Target hardware: NVIDIA GPUs including Jetson/Spark

Quantization Stats

  • Tensors quantized: 5207
  • Tensors preserved (FP16): 84 (embeddings, norms, biases)

Files

  • model_nvfp4_quantized.safetensors - Quantized weight tensors (packed uint8 + fp16 scales)
  • model_nvfp4_unquantized.safetensors - Preserved tensors (embeddings, norms, biases)
  • nvfp4_metadata.json - Quantization metadata

Usage

from safetensors import safe_open
import torch

# FP4 E2M1 lookup table
FP4_VALUES = torch.tensor([0.0, 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 6.0])

def dequantize_nvfp4(packed, scales, shape, block_size=32):
    sign1 = (packed >> 7) & 1
    idx1 = (packed >> 4) & 7
    sign2 = (packed >> 3) & 1
    idx2 = packed & 7

    num_blocks = packed.shape[0]
    unpacked = torch.zeros(num_blocks, block_size)
    unpacked[:, 0::2] = FP4_VALUES[idx1] * (1 - 2 * sign1.float())
    unpacked[:, 1::2] = FP4_VALUES[idx2] * (1 - 2 * sign2.float())

    dequantized = (unpacked * scales.unsqueeze(-1)).flatten()
    return dequantized[:torch.prod(torch.tensor(shape))].view(shape)

# Load and dequantize
with safe_open("model_nvfp4_quantized.safetensors", framework="pt") as f:
    for key in f.keys():
        if key.endswith(".packed"):
            name = key.replace(".packed", "")
            packed = f.get_tensor(f"{name}.packed")
            scales = f.get_tensor(f"{name}.scales")
            shape = tuple(f.get_tensor(f"{name}.shape").tolist())
            weight = dequantize_nvfp4(packed, scales, shape)
            print(f"{name}: {weight.shape}")

Base Model

DeepSeek-Coder-V2-Lite-Instruct - MoE code model:

  • 16B total / 2.4B active parameters
  • 64 experts, 6 activated + 2 shared
  • 128K context length

License

Inherits license from base model. See original.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DJLougen/DeepSeek-Coder-V2-Lite-Instruct-NVFP4

Finetuned
(11)
this model