Tiny random version of DeepSeek-V4.

In development. Might not work!

vllm serve yujiepan/deepseek-v4-tiny-random \
    --trust-remote-code \
    --block-size 256 \
    --kv-cache-dtype fp8 \
    --data-parallel-size 1 \
    --max-model-len 12000 \
    --gpu-memory-utilization 0.5 \
    --max-num-seqs 512 \
    --max-num-batched-tokens 512 \
    --no-enable-flashinfer-autotune \
    --compilation-config '{"mode": 0, "cudagraph_mode": "FULL_DECODE_ONLY"}' \
    --tokenizer-mode deepseek_v4 \
    --tool-call-parser deepseek_v4 \
    --enable-auto-tool-choice \
    --reasoning-parser deepseek_v4 \
    --speculative_config '{"method":"mtp","num_speculative_tokens":1}'
Downloads last month
451
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yujiepan/deepseek-v4-tiny-random

Quantized
(5)
this model