Instructions to use Qwen/Qwen2-7B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Qwen/Qwen2-7B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Qwen/Qwen2-7B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-7B-Instruct") model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-7B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Qwen/Qwen2-7B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Qwen/Qwen2-7B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen2-7B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Qwen/Qwen2-7B-Instruct
- SGLang
How to use Qwen/Qwen2-7B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Qwen/Qwen2-7B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen2-7B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Qwen/Qwen2-7B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen2-7B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Qwen/Qwen2-7B-Instruct with Docker Model Runner:
docker model run hf.co/Qwen/Qwen2-7B-Instruct
Math problems
I gave the model a random equation to solve and the model can't manage to get it right.
I ask: "What is 2 + 2 × 7 ÷ 6 + 7 ÷ (2 × 2) × (21 × 32)"
Qwen2-7B instruct responds:
To solve the expression (2 + 2 \times 7 \div 6 + 7 \div (2 \times 2) \times (21 \times 32)), we need to follow the order of operations, often remembered by the acronym PEMDAS: Parentheses, Exponents, Multiplication and Division from left to right, Addition and Subtraction from left to right.
Let's solve it step-by-step:
Step 1: Solve inside Parentheses
- (2 \times 7)
- (2 \times 2)
- (21 \times 32)
Step 2: Perform Multiplication and Division in order from left to right
Step 3: Add the remaining numbers
Let's calculate:
Inside Parentheses:
- (2 \times 7 = 14)
- (2 \times 2 = 4)
- (21 \times 32 = 672)
So, the expression becomes:
[2 + 14 \div 6 + 7 \div 4 \times 672]
Next Steps:
Now we solve multiplication and division from left to right:
- (14 \div 6 = 2.33...) (approximated for simplification, exact value would be more precise)
- (7 \div 4 = 1.75)
Then multiply this result by 672:- (1.75 \times 672)
Let's calculate these:
Calculation of Multiplication and Division:
- (14 \div 6 = 2.33...) (approximated for simplicity)
- (7 \div 4 = 1.75)
[1.75 \times 672 = 1188]
So the expression now looks like:
[2 + 2.33... + 1188]
Final Addition:
- Add all numbers together:
[2 + 2.33... + 1188 = 1192.33...]
Rounding to the nearest whole number or keeping it precise depending on context, we might express this as approximately (1192) if rounding was intended or precisely (1192.33...).
Therefore, following the order of operations and calculations:
[2 + 2 \times 7 \div 6 + 7 \div (2 \times 2) \times (21 \times 32)] equals approximately (1192) or precisely (1192.33...), depending on rounding rules applied.
Yi-9B responds:
To solve this problem, we'll follow the order of operations, which is often remembered by the acronym PEMDAS: Parentheses, Exponents, Multiplication and Division (from left to right), Addition and Subtraction (from left to right).
Given expression: 2 + 2 × 7 ÷ 6 + 7 ÷ (2 × 2) × (21 × 32)
First, we'll solve the parentheses and exponents (there are none in this problem), then multiplication and division from left to right, followed by addition and subtraction from left to right.
Let's break it down step by step:
Multiplication inside parentheses: 2 × 2 = 4
Updated expression: 2 + 2 × 7 ÷ 6 + 7 ÷ 4 × (21 × 32)Multiplication and division from left to right:
2 × 7 = 14; 14 ÷ 6 = 2.3333 (rounded to four decimal places)
Updated expression: 2 + 2.3333 + 7 ÷ 4 × (21 × 32)Multiplication inside brackets: 21 × 32 = 672
Updated expression: 2 + 2.3333 + 7 ÷ 4 × 672Division: 7 ÷ 4 = 1.75
Updated expression: 2 + 2.3333 + 1.75 × 672Multiplication: 1.75 × 672 = 1176
Final expression: 2 + 2.3333 + 1176Addition from left to right:
2 + 2.3333 = 4.3333; 4.3333 + 1176 = 1180.3333
So, the result of 2 + 2 × 7 ÷ 6 + 7 ÷ (2 × 2) × (21 × 32) is approximately 1180.3333 when rounded to four decimal places.
The Gemini app responds:
I've calculated the expression following the order of operations (PEMDAS) and accounting for implicit multiplication. The answer is 1,180.33333.
And my pixel's calculator responds:
1180.3333333333
Both local models are being run under ollama with no changes to configs.