Instructions to use WeiboAI/VibeThinker-1.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WeiboAI/VibeThinker-1.5B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="WeiboAI/VibeThinker-1.5B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("WeiboAI/VibeThinker-1.5B") model = AutoModelForCausalLM.from_pretrained("WeiboAI/VibeThinker-1.5B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use WeiboAI/VibeThinker-1.5B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "WeiboAI/VibeThinker-1.5B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WeiboAI/VibeThinker-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/WeiboAI/VibeThinker-1.5B
- SGLang
How to use WeiboAI/VibeThinker-1.5B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "WeiboAI/VibeThinker-1.5B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WeiboAI/VibeThinker-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "WeiboAI/VibeThinker-1.5B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WeiboAI/VibeThinker-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use WeiboAI/VibeThinker-1.5B with Docker Model Runner:
docker model run hf.co/WeiboAI/VibeThinker-1.5B
hello? 虽然是一个推理模型,但有的方面也太离谱了吧
明显过拟合了
Our model is an experimental research prototype dedicated to mathematical reasoning, released specifically to validate the claims in our paper. It relies on a math-only base model with further post-training focused on math and code. Consequently, it has not been aligned for general conversational capabilities. We do not recommend using this model for general chat, as it is biased towards responding from a problem-solving perspective. Additionally, please note that running inference with quantized versions may lead to increased hallucinations when testing general conversation scenarios.
明显过拟合了
我们的训练过程经过严格去污,可以泛化到数学、竞赛类编程内的其他未见过的题目。我们不推荐将该模型用于日常对话等领域进行测试,因为该模型由Qwen2.5数学base模型进行数学、code、stem领域后训练得到,并未针对性做RLHF等用于日常问答的优化。该问题不属于过拟合问题。
