Text Generation
Transformers
Safetensors
deepseek_v4
cybersecurity
ctf
autonomous-agent
mixture-of-experts
long-context
reinforcement-learning
grpo
lora
security-research
fp8
Instructions to use Chunjiang-Intelligence/DeepSeek-v4-Fable with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Chunjiang-Intelligence/DeepSeek-v4-Fable with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Chunjiang-Intelligence/DeepSeek-v4-Fable")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Chunjiang-Intelligence/DeepSeek-v4-Fable") model = AutoModelForCausalLM.from_pretrained("Chunjiang-Intelligence/DeepSeek-v4-Fable") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Chunjiang-Intelligence/DeepSeek-v4-Fable with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Chunjiang-Intelligence/DeepSeek-v4-Fable" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Chunjiang-Intelligence/DeepSeek-v4-Fable", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Chunjiang-Intelligence/DeepSeek-v4-Fable
- SGLang
How to use Chunjiang-Intelligence/DeepSeek-v4-Fable with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Chunjiang-Intelligence/DeepSeek-v4-Fable" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Chunjiang-Intelligence/DeepSeek-v4-Fable", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Chunjiang-Intelligence/DeepSeek-v4-Fable" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Chunjiang-Intelligence/DeepSeek-v4-Fable", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Chunjiang-Intelligence/DeepSeek-v4-Fable with Docker Model Runner:
docker model run hf.co/Chunjiang-Intelligence/DeepSeek-v4-Fable
model unusable
#3
by zaddyzaddy - opened
tried serving using vLLM and Sglang
sglang serve \
--trust-remote-code \
--model-path Chunjiang-Intelligence/DeepSeek-v4-Fable \
--tp 8 \
--moe-runner-backend flashinfer_mxfp4 \
--speculative-algorithm EAGLE \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4 \
--chunked-prefill-size 4096 \
--disable-flashinfer-autotune \
--swa-full-tokens-ratio 0.1 \
--reasoning-parser deepseek-v4 \
--tool-call-parser deepseekv4 \
--host 0.0.0.0 \
--port 30000
Fails with
[2026-06-24 20:18:39] Unexpected routed-expert safetensors dtype=BF16 for DeepSeek V4
[2026-06-24 20:18:39] Hybrid swa model: self.hf_config.architectures=['DeepseekV4ForCausalLM']
[transformers] Unrecognized keys in `rope_parameters` for 'rope_type'='default': {'attention_factor'}
[2026-06-24 20:18:40] kill_process_tree called: parent_pid=12771, include_parent=False, pid=12771
Traceback (most recent call last):
File "/usr/local/bin/sglang", line 6, in <module>
sys.exit(main())
^^^^^^
File "/sgl-workspace/sglang/python/sglang/cli/main.py", line 40, in main
serve(args, extra_argv)
File "/sgl-workspace/sglang/python/sglang/cli/serve.py", line 128, in serve
run_server(server_args)
File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 50, in run_server
launch_server(server_args)
File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/http_server.py", line 2401, in launch_server
) = Engine._launch_subprocesses(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 866, in _launch_subprocesses
tokenizer_manager, template_manager = init_tokenizer_manager_func(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 137, in init_tokenizer_manager
tokenizer_manager = TokenizerManagerClass(server_args, port_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/tokenizer_manager.py", line 266, in __init__
self.init_tokenizer_and_processor()
File "/sgl-workspace/sglang/python/sglang/srt/managers/tokenizer_manager.py", line 354, in init_tokenizer_and_processor
self.tokenizer = get_tokenizer(
^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/utils/hf_transformers/tokenizer.py", line 499, in get_tokenizer
tokenizer = _auto_tokenizer_from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/utils/hf_transformers/tokenizer.py", line 165, in _auto_tokenizer_from_pretrained
tokenizer = AutoTokenizer.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py", line 837, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 1743, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 1933, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_tokenizers.py", line 376, in __init__
raise ValueError(
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a `tokenizers` library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece or tiktoken installed to convert a slow tokenizer to a fast one.