Instructions to use ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts
- SGLang
How to use ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts with Docker Model Runner:
docker model run hf.co/ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts
We look forward to a perfect AWQ or GPTQ quantized version.
We look forward to a perfect AWQ or GPTQ quantized version. Considering the enhanced programming and mathematical capabilities of the new R1 version, traditional quantization methods may need improvement to preserve the specialized knowledge related to mathematics and programming as much as possible, avoiding compression by quantization. On GPTQ quantized versions released by other organizations, a noticeably higher error rate compared to the official version has been observed during longer programming tasks; this degradation has reached a noticeable level. To maintain programming and mathematical capabilities, a slightly larger memory footprint is acceptable. Taking a single H20 with 768G of VRAM as the baseline, maintaining a 65,535-token context length under such hardware conditions would be excellent.