Instructions to use bsarpel/DeepSeek-V4-Flash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bsarpel/DeepSeek-V4-Flash with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bsarpel/DeepSeek-V4-Flash")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("bsarpel/DeepSeek-V4-Flash") model = AutoModelForCausalLM.from_pretrained("bsarpel/DeepSeek-V4-Flash") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use bsarpel/DeepSeek-V4-Flash with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bsarpel/DeepSeek-V4-Flash" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bsarpel/DeepSeek-V4-Flash", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/bsarpel/DeepSeek-V4-Flash
- SGLang
How to use bsarpel/DeepSeek-V4-Flash with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bsarpel/DeepSeek-V4-Flash" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bsarpel/DeepSeek-V4-Flash", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bsarpel/DeepSeek-V4-Flash" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bsarpel/DeepSeek-V4-Flash", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use bsarpel/DeepSeek-V4-Flash with Docker Model Runner:
docker model run hf.co/bsarpel/DeepSeek-V4-Flash
| { | |
| "tools": [ | |
| { | |
| "type": "function", | |
| "function": { | |
| "name": "get_weather", | |
| "description": "Get the weather for a specific location", | |
| "parameters": { | |
| "type": "object", | |
| "properties": { | |
| "location": { | |
| "type": "string", | |
| "description": "The city name" | |
| }, | |
| "unit": { | |
| "type": "string", | |
| "enum": ["celsius", "fahrenheit"], | |
| "description": "Temperature unit" | |
| } | |
| }, | |
| "required": ["location"] | |
| } | |
| } | |
| }, | |
| { | |
| "type": "function", | |
| "function": { | |
| "name": "search", | |
| "description": "Search the web for information", | |
| "parameters": { | |
| "type": "object", | |
| "properties": { | |
| "query": { | |
| "type": "string", | |
| "description": "Search query" | |
| }, | |
| "num_results": { | |
| "type": "integer", | |
| "description": "Number of results to return" | |
| } | |
| }, | |
| "required": ["query"] | |
| } | |
| } | |
| } | |
| ], | |
| "messages": [ | |
| { | |
| "role": "system", | |
| "content": "You are a helpful assistant." | |
| }, | |
| { | |
| "role": "user", | |
| "content": "What's the weather in Beijing?" | |
| }, | |
| { | |
| "role": "assistant", | |
| "reasoning_content": "The user wants to know the weather in Beijing. I should use the get_weather tool.", | |
| "tool_calls": [ | |
| { | |
| "id": "call_001", | |
| "type": "function", | |
| "function": { | |
| "name": "get_weather", | |
| "arguments": "{\"location\": \"Beijing\", \"unit\": \"celsius\"}" | |
| } | |
| } | |
| ] | |
| }, | |
| { | |
| "role": "tool", | |
| "tool_call_id": "call_001", | |
| "content": "{\"temperature\": 22, \"condition\": \"sunny\", \"humidity\": 45}" | |
| }, | |
| { | |
| "role": "assistant", | |
| "reasoning_content": "Got the weather data. Let me format a nice response.", | |
| "content": "The weather in Beijing is currently sunny with a temperature of 22°C and 45% humidity." | |
| } | |
| ] | |
| } | |