YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
DeepSeek V3.2 FP8 -> FP4 (E2E)
This is a minimal, V3.2-only end‑to‑end flow using the official V3.2 inference code (not V3.2-Exp). Each step says what it’s doing.
Set paths (only the vars actually used below)
# NOTE: HF cache stores safetensors under snapshots/<hash>/, NOT at the model root!
# Find the correct path with: ls /root/.cache/huggingface/hub/models--deepseek-ai--DeepSeek-V3.2/snapshots/
export HF_FP8_CKPT="/root/.cache/huggingface/hub/models--deepseek-ai--DeepSeek-V3.2/snapshots/a7e62ac04ecb2c0a54d736dc46601c5606cf10a6"
export DS_CKPT="/sgl-workspace/sglang/Model-Optimizer/examples/deepseek/out/deepseek-v3.2-fp8"
export FP4_QUANT_PATH="/sgl-workspace/sglang/Model-Optimizer/examples/deepseek/out/deepseek-v3.2-fp4-calib"
export HF_FP4_PATH="/sgl-workspace/sglang/Model-Optimizer/examples/deepseek/out/deepseek-v3.2-fp4-hf"
export MP=4 # must match --nproc-per-node in PTQ step
export EXPERTS=256
Get the V3.2 inference repo (skip safetensors download)
# work from the Model-Optimizer example directory so paths below match
cd /sgl-workspace/sglang/Model-Optimizer/examples/deepseek
# clone the official V3.2 inference repo without pulling LFS (avoids safetensors download)
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/deepseek-ai/DeepSeek-V3.2 && cd DeepSeek-V3.2
# we install FHT from src. see setup.py in FHT repo to see the more step / end var.
pip install git+https://github.com/Dao-AILab/fast-hadamard-transform.git
# install inference dependencies from the official repo
pip install -r inference/requirements.txt
Convert the HF checkpoint to DeepSeek format
cd DeepSeek-V3.2/inference
python convert.py \
--hf-ckpt-path $HF_FP8_CKPT \
--save-path $DS_CKPT \
--n-experts $EXPERTS \
--model-parallel $MP
Run calibration (PTQ) for V3.2
cd /sgl-workspace/sglang/Model-Optimizer/examples/deepseek
# --nproc-per-node MUST match --model-parallel used in convert.py (i.e. $MP)
CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --nproc-per-node $MP --master_port=12346 ptq.py \
--model_path $DS_CKPT \
--config DeepSeek-V3.2/inference/config_671B_v3.2.json \
--quant_cfg NVFP4_DEFAULT_CFG \
--output_path $FP4_QUANT_PATH
Quantize FP8 -> NVFP4 and assemble final HF checkpoint
./quantize_fp8_to_nvfp4.sh \
--amax_path $FP4_QUANT_PATH \
--fp4_output_path $HF_FP4_PATH \
--fp8_hf_path $HF_FP8_CKPT \
--world_size $MP
(Optional) Run V3.2 inference to verify the converted checkpoint
cd DeepSeek-V3.2/inference
export CONFIG=config_671B_v3.2.json
torchrun --nproc-per-node $MP generate.py --ckpt-path $DS_CKPT --config $CONFIG --interactive
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support