Instructions to use osunlp/QUEST-35B-MT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use osunlp/QUEST-35B-MT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="osunlp/QUEST-35B-MT") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("osunlp/QUEST-35B-MT") model = AutoModelForMultimodalLM.from_pretrained("osunlp/QUEST-35B-MT") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use osunlp/QUEST-35B-MT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "osunlp/QUEST-35B-MT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "osunlp/QUEST-35B-MT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/osunlp/QUEST-35B-MT
- SGLang
How to use osunlp/QUEST-35B-MT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "osunlp/QUEST-35B-MT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "osunlp/QUEST-35B-MT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "osunlp/QUEST-35B-MT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "osunlp/QUEST-35B-MT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use osunlp/QUEST-35B-MT with Docker Model Runner:
docker model run hf.co/osunlp/QUEST-35B-MT
Missing Mid Training Data
when clicking mt training data it comes to a 404 page. Was it removed or not uploaded yet?
https://huggingface.co/datasets/osunlp/QUEST-Mid-Training-Data
Thanks for the question! The mid-training data is a bit tricky because it contains raw HTML content, which might raise legal concerns. We are still looking for the best way to handle this. It will be released once we find a suitable solution.
Hi @hsaest ,
Thanks for the clarification! Since we are reproducing the mid-training data, could we double-check if we can reconstruct the two atomic tasks using successful search trajectories via the following setup?
Context Summarization: Slice trajectories into checkpoints. Pass historical interactions as events and the last state as prev_state, then use your open-sourced Quest Prompt to distill the target JSON.
Relevant Information Extraction: Extract triplets of (webpage content, extraction goal, extracted content) directly from the tool cache, and apply Jaccard similarity (0.1) to deduplicate the goals.
One quick question on the input data: Since your paper mentions using raw HTML content for the extraction task, would substituting it with cleaned/processed webpage text (e.g., Markdown/Text format with headers/footers removed) significantly impact the model's grounding and data triage capabilities during target-only loss training?
Thanks for your insights!
Hi @danqiao-cuhk ,
We released context summarization data in https://huggingface.co/datasets/osunlp/QUEST-Mid-Training-Data, as well as a minimal example of Relevant Information Extraction.
For Context Summarization reconstruction, you could refer to our data directly. The input is the "historical session", and the output is the summarized content generated by the condenser.
For Relevant Information Extraction, your understanding is correct.
Our raw HTML content is preprocessed by Jina, which includes the information of title, URL, and the markdown content. You can find an example here https://jina.ai/api-dashboard/reader:
Title: Example Domain
URL Source: https://www.example.com/
Published Time: Fri, 19 Jun 2026 18:46:03 GMT
Warning: This is a cached snapshot of the original page, consider retry with caching opt-out.
Markdown Content:
This domain is for use in documentation examples without needing permission. Avoid use in operations.
Let me know if you have any further questions! Thanks for your interest in our work!