Instructions to use internlm/internlm2-chat-20b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use internlm/internlm2-chat-20b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="internlm/internlm2-chat-20b", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("internlm/internlm2-chat-20b", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use internlm/internlm2-chat-20b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "internlm/internlm2-chat-20b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "internlm/internlm2-chat-20b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/internlm/internlm2-chat-20b
- SGLang
How to use internlm/internlm2-chat-20b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "internlm/internlm2-chat-20b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "internlm/internlm2-chat-20b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "internlm/internlm2-chat-20b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "internlm/internlm2-chat-20b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use internlm/internlm2-chat-20b with Docker Model Runner:
docker model run hf.co/internlm/internlm2-chat-20b
prompt好像还有些问题
我提了一个问题,发现在后面加上\nAssistant:效果会好很多,可以帮忙看下吗
query="""小明的爸爸有三个儿子,小明1992年出生,大儿子叫大毛,二儿子叫二毛,今年是2023年,请问三儿子今年多大了"""
response, history = model.chat(tokenizer,f'''{query}\nAssistant:'''
, history=[], temperature=0.01,top_p =0.01,repetition_penalty = 1.0,max_new_tokens=1024,do_sample =False)
print(response)
response, history = model.chat(tokenizer,query
, history=[], temperature=0.01,top_p =0.01,repetition_penalty = 1.0,max_new_tokens=1024,do_sample =False)
print(response)
根据题目描述,小明的爸爸有三个儿子,分别是:
- 大儿子:大毛
- 二儿子:二毛
- 三儿子:小明
小明出生于1992年,今年是2023年,所以小明今年是2023 - 1992 = 31岁。
因此,三儿子今年31岁。
根据题目描述,小明的爸爸有三个儿子,分别是老大、老二、老三。已知小明出生于1992年,今年是2023年,那么小明今年是2023-1992=31岁。
由于题目中没有给出大毛和二毛的出生年份,因此无法直接计算出他们的年龄。但是,我们可以根据题目中的信息推断出一些有用的信息:
- 大毛是老大,所以他的年龄应该比小明大。
- 二毛是老二,所以他的年龄应该比大毛小,但比小明大。
- 老三是三儿子,所以他的年龄应该比二毛小,但比小明大。
根据以上信息,我们可以得出以下结论:
- 老三的年龄应该比小明大,所以老三的年龄至少是32岁。
- 老三的年龄应该比二毛小,所以老三的年龄最大可能是32岁。
因此,我们可以得出结论:老三今年应该是32岁。
https://huggingface.co/internlm/internlm2-chat-20b/blob/main/modeling_internlm2.py#L1144
如果调用model.chat接口,上面这段代码是会自动给query添加chatml对话模板的assistant前缀的,不需要再手动添加assistant。