Instructions to use tencent/Youtu-LLM-2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tencent/Youtu-LLM-2B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="tencent/Youtu-LLM-2B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tencent/Youtu-LLM-2B")
model = AutoModelForCausalLM.from_pretrained("tencent/Youtu-LLM-2B", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use tencent/Youtu-LLM-2B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tencent/Youtu-LLM-2B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tencent/Youtu-LLM-2B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/tencent/Youtu-LLM-2B

SGLang

How to use tencent/Youtu-LLM-2B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "tencent/Youtu-LLM-2B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tencent/Youtu-LLM-2B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "tencent/Youtu-LLM-2B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tencent/Youtu-LLM-2B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use tencent/Youtu-LLM-2B with Docker Model Runner:
```
docker model run hf.co/tencent/Youtu-LLM-2B
```

@check_model_inputs # <--- 这是导致问题的行

#11

by aifeifei798 - opened Jan 7

Discussion

aifeifei798

Jan 7

modeling_youtu.py

-> 474 line

@check_model_inputs # <--- 这是导致问题的行
@auto_docstring
def forward(
self,
input_ids: Optional[torch.LongTensor] = None,
attention_mask: Optional[torch.Tensor] = None,
# ... other arguments ...
):

@check_model_inputs # <--- 已注释掉

@auto_docstring
def forward(
self,
input_ids: Optional[torch.LongTensor] = None,
attention_mask: Optional[torch.Tensor] = None,
# ... other arguments ...
):

import re
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# 1. 配置模型
model_id = "./Youtu-LLM-2B"

# 2. 初始化Tokenizer和模型
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True
)

# 3. 构建对话输入
prompt = "Hello"
messages = [{"role": "user", "content": prompt}]

# 使用 apply_chat_template 构建输入
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    enable_thinking=True
).to(model.device)

# 手动创建 attention_mask
attention_mask = torch.ones_like(input_ids).to(model.device)


# 4. 生成回复
# 在移除了错误的装饰器后，标准的调用方式应该可以正常工作
outputs = model.generate(
    input_ids=input_ids,
    attention_mask=attention_mask,
    max_new_tokens=512,
    do_sample=True,
    temperature=1.0,
    top_p=0.95,
    repetition_penalty=1.05,
    pad_token_id=tokenizer.eos_token_id # 添加此行以避免警告
)

# 5. 解析结果
full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)

def parse_reasoning(text):
    """提取<think>标签内的思考过程和之后的回答内容"""
    thought_pattern = r"<think>(.*?)</think>"
    match = re.search(thought_pattern, text, re.DOTALL)

    if match:
        thought = match.group(1).strip()
        answer = text.split("</think>")[-1].strip()
    else:
        thought = "(No explicit thought process generated)"
        answer = text
    return thought, answer

thought, final_answer = parse_reasoning(full_response)

print(f"\n{'='*20} Thought Process {'='*20}\n{thought}")
print(f"\n{'='*20} Final Answer {'='*20}\n{final_answer}")

Junrulu

Tencent org Jan 7

Hi,

May I ask which version of transformers you are working with? We are running this demo code with transformers==4.56.0, and the output is normal:

==================== Thought Process ====================
The user greeted with 'Hello', which is a simple and friendly opening. Since the input is in English, I should respond in English as per the instruction. I need to introduce myself clearly according to the defined identity: state my name (Youtu-llm), developer (Tencent Youtu team), purpose (helping users solve problems), key capabilities (mathematics, coding, Agent), and goal (efficient and accurate problem-solving). The response should be welcoming and open-ended to encourage further interaction, while staying within the provided identity constraints. No extra information beyond what is specified should be added.

==================== Final Answer ====================
Hello! I am Youtu-llm, a large language model developed by the Tencent Youtu team. I am designed to assist users in solving various problems, excelling in tasks such as mathematics, coding, and Agent-related operations. My goal is to make problem-solving more efficient and accurate through intelligent interaction. How can I assist you today?

Junrulu

Tencent org Jan 7

Hi @aifeifei798 ,

We do a further check in terms of transformers' version. Here are some suggestions:
(1) We recommend to limit its version: pip install "transformers>=4.56.0,<=4.57.1", which is comparable with the current remote codes;
(2) Do not use transformers==4.57.2, since there is a bug unfixed (https://github.com/huggingface/transformers/issues/42395);
(3) If you would like to maintain a higher version (e.g., 4.57.3), you can comment out "check_model_inputs" as what you have done, or modify it to "check_model_inputs()", following the patch (https://github.com/huggingface/transformers/commit/ede92a8755e48da7ae1d1b7d976ad581aa5c8327#diff-00deeb775525887b5d4f029e8084dd85737e561d4e2606ec8b4787f55d6cf286).

Thank you for the remainder, we will modify the README as aforementioned.

aifeifei798

Jan 7

Okey, Good job

aifeifei798 changed discussion status to closed Jan 7

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment