Instructions to use FINAL-Bench/Darwin-28B-Opus with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FINAL-Bench/Darwin-28B-Opus with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="FINAL-Bench/Darwin-28B-Opus") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("FINAL-Bench/Darwin-28B-Opus") model = AutoModelForMultimodalLM.from_pretrained("FINAL-Bench/Darwin-28B-Opus") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Local Apps Settings
- vLLM
How to use FINAL-Bench/Darwin-28B-Opus with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FINAL-Bench/Darwin-28B-Opus" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-28B-Opus", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/FINAL-Bench/Darwin-28B-Opus
- SGLang
How to use FINAL-Bench/Darwin-28B-Opus with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FINAL-Bench/Darwin-28B-Opus" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-28B-Opus", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FINAL-Bench/Darwin-28B-Opus" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-28B-Opus", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use FINAL-Bench/Darwin-28B-Opus with Docker Model Runner:
docker model run hf.co/FINAL-Bench/Darwin-28B-Opus
Missing preprocessor_config.json when deploying Darwin-28B models
Hello, I encountered some issues while trying to deploy this model.
I can deploy and use FINAL-Bench/Darwin-27B-Opus smoothly. Based on that setup, I tried deploying both FINAL-Bench/Darwin-28B-REASON and FINAL-Bench/Darwin-28B-Opus with commands similar to the following:
sglang serve --served-model-name default \
--model-path FINAL-Bench/Darwin-28B-Opus \
--reasoning-parser qwen3 --tool-call-parser qwen3_coder \
--mem-fraction-static 0.90 --context-length 262144 \
--tokenizer-worker-num 8 --schedule-policy lpm \
--cuda-graph-max-bs 16 --max-running-requests 16 \
--chunked-prefill-size 16384 --enable-mixed-chunk \
--tp-size 4 --pp-size 1 --dp-size 1 --ep-size 1 \
--trust-remote-code --enable-multimodal \
--host 0.0.0.0 --port 38011
However, I then get the following error:
OSError: Can't load image processor for '$HOME/.cache/huggingface/hub/models--FINAL-Bench--Darwin-28B-Opus/snapshots/165e98249be214cbdc19015fa565ad1b571f2e76/'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '$HOME/.cache/huggingface/hub/models--FINAL-Bench--Darwin-28B-Opus/snapshots/165e98249be214cbdc19015fa565ad1b571f2e76/' is the correct path to a directory containing a preprocessor_config.json file
As a quick workaround, I tried copying preprocessor_config.json from FINAL-Bench/Darwin-27B-Opus and Qwen/Qwen3.6-27B into the model directory. After doing so, the service was able to start normally.
However, during agent-based calls โ for example, using VS Code Copilot or Hermes โ I consistently run into the same issue: the model outputs a brief reasoning sentence such as:
The user wants a summary of the entire DOC.md document, not just the selected section. Let me read the full document to provide a comprehensive summary.
and then immediately stops generating any further output.
Iโm not sure what exactly is going wrong here. Specifically, Iโm unclear about:
- how this model is supposed to be launched when
preprocessor_config.jsonis missing; - why the agent calls start behaving abnormally after this workaround.
If this is due to an issue in my deployment or configuration, I would really appreciate your guidance. On the other hand, if some files are missing from the model repository, I would be very grateful if you could help fix them.
Thank you for reporting this โ we've fixed both issues.
1. Missing preprocessor_config.json
The file has been added to both repositories:
- FINAL-Bench/Darwin-28B-Opus โ commit 950d8ed
- FINAL-Bench/Darwin-28B-REASON โ commit 4669ba2
You no longer need the manual workaround.
2. Agent calls stopping after reasoning
This is caused by using --reasoning-parser qwen3 and --tool-call-parser qwen3_coder together โ a known SGLang conflict that leaves content=null. Remove --reasoning-parser qwen3 and --enable-mixed-chunk from your launch command. The model will still emit <think>...</think> content normally; it just won't be split into a separate field.
Let us know if anything else comes up!
Thanks for the quick fix and clarification!
Iโll pull the latest version again, and retest the agent invocation after removing --reasoning-parser qwen3 and --enable-mixed-chunk as suggested.
If I encounter any further issues, Iโll provide the logs and reproduction steps. Thanks again!