Instructions to use jinaai/ReaderLM-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jinaai/ReaderLM-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jinaai/ReaderLM-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("jinaai/ReaderLM-v2")
model = AutoModelForCausalLM.from_pretrained("jinaai/ReaderLM-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Local Apps Settings

vLLM

How to use jinaai/ReaderLM-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jinaai/ReaderLM-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jinaai/ReaderLM-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/jinaai/ReaderLM-v2

SGLang

How to use jinaai/ReaderLM-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jinaai/ReaderLM-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jinaai/ReaderLM-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jinaai/ReaderLM-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jinaai/ReaderLM-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use jinaai/ReaderLM-v2 with Docker Model Runner:
```
docker model run hf.co/jinaai/ReaderLM-v2
```

KeyError: 'lm_head.weight' when using sglang to load this model

by aqweteddy - opened Jan 21, 2025

Discussion

aqweteddy

Jan 21, 2025

Are there any changes to the architecture of this model compared to Qwen?

full error message:

[2025-01-21 05:11:11 DP1 TP0] Scheduler hit an exception: Traceback (most recent call last):                            
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1747, in run_sch
eduler_process                                                                                                          
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)                                             
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 239, in __init__
    self.tp_worker = TpWorkerClass(                                                                                     
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 6
3, in __init__                                                                                                          
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)                                       
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 68, in __init__ 
    self.model_runner = ModelRunner(                                                                                    
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 185, in
 __init__                                                                                                               
    self.load_model()                                                                                                   
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 303, in
 load_model                                                                                                             
    self.model = get_model(                                                                                             
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/sglang/srt/model_loader/__init__.py", line 22, in get_mo
del                                                                                                                     
    return loader.load_model(                                                                                           
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/sglang/srt/model_loader/loader.py", line 362, in load_mo
del                                                                                                                     
    model.load_weights(self._get_all_weights(model_config, model))                                                      
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/sglang/srt/models/qwen2.py", line 395, in load_weights  
    param = params_dict[name]                                                                                           
KeyError: 'lm_head.weight'

numb3r3

Jina AI org Jan 22, 2025

well, I just realized the lm_head.weight is also included in the safetensor file, where it does not agree with the config tie_weights=true. I just fixed this issue in the latest commit, please give it a try. Thank you for your report, that help us a lot.

aqweteddy changed discussion status to closed Jan 22, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment