YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DeepSeek V3.2 FP8 -> FP4 (E2E)

This is a minimal, V3.2-only end‑to‑end flow using the official V3.2 inference code (not V3.2-Exp). Each step says what it’s doing.

Set paths (only the vars actually used below)

# NOTE: HF cache stores safetensors under snapshots/<hash>/, NOT at the model root!
# Find the correct path with: ls /root/.cache/huggingface/hub/models--deepseek-ai--DeepSeek-V3.2/snapshots/
export HF_FP8_CKPT="/root/.cache/huggingface/hub/models--deepseek-ai--DeepSeek-V3.2/snapshots/a7e62ac04ecb2c0a54d736dc46601c5606cf10a6"
export DS_CKPT="/sgl-workspace/sglang/Model-Optimizer/examples/deepseek/out/deepseek-v3.2-fp8"
export FP4_QUANT_PATH="/sgl-workspace/sglang/Model-Optimizer/examples/deepseek/out/deepseek-v3.2-fp4-calib"
export HF_FP4_PATH="/sgl-workspace/sglang/Model-Optimizer/examples/deepseek/out/deepseek-v3.2-fp4-hf"
export MP=4  # must match --nproc-per-node in PTQ step
export EXPERTS=256

Get the V3.2 inference repo (skip safetensors download)

# work from the Model-Optimizer example directory so paths below match
cd /sgl-workspace/sglang/Model-Optimizer/examples/deepseek

# clone the official V3.2 inference repo without pulling LFS (avoids safetensors download)
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/deepseek-ai/DeepSeek-V3.2 && cd DeepSeek-V3.2

# we install FHT from src. see setup.py in FHT repo to see the more step / end var.
pip install git+https://github.com/Dao-AILab/fast-hadamard-transform.git
# install inference dependencies from the official repo
pip install -r inference/requirements.txt

Convert the HF checkpoint to DeepSeek format

cd DeepSeek-V3.2/inference
python convert.py \
  --hf-ckpt-path $HF_FP8_CKPT \
  --save-path $DS_CKPT \
  --n-experts $EXPERTS \
  --model-parallel $MP

Run calibration (PTQ) for V3.2

cd /sgl-workspace/sglang/Model-Optimizer/examples/deepseek
# --nproc-per-node MUST match --model-parallel used in convert.py (i.e. $MP)
CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --nproc-per-node $MP --master_port=12346 ptq.py \
  --model_path $DS_CKPT \
  --config DeepSeek-V3.2/inference/config_671B_v3.2.json \
  --quant_cfg NVFP4_DEFAULT_CFG \
  --output_path $FP4_QUANT_PATH

Quantize FP8 -> NVFP4 and assemble final HF checkpoint

./quantize_fp8_to_nvfp4.sh \
  --amax_path $FP4_QUANT_PATH \
  --fp4_output_path $HF_FP4_PATH \
  --fp8_hf_path $HF_FP8_CKPT \
  --world_size $MP

(Optional) Run V3.2 inference to verify the converted checkpoint

cd DeepSeek-V3.2/inference
export CONFIG=config_671B_v3.2.json
torchrun --nproc-per-node $MP generate.py --ckpt-path $DS_CKPT --config $CONFIG --interactive

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support