Instructions to use internlm/internlm2_5-20b-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use internlm/internlm2_5-20b-chat with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="internlm/internlm2_5-20b-chat", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("internlm/internlm2_5-20b-chat", trust_remote_code=True, device_map="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use internlm/internlm2_5-20b-chat with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "internlm/internlm2_5-20b-chat"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "internlm/internlm2_5-20b-chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/internlm/internlm2_5-20b-chat

SGLang

How to use internlm/internlm2_5-20b-chat with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "internlm/internlm2_5-20b-chat" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "internlm/internlm2_5-20b-chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "internlm/internlm2_5-20b-chat" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "internlm/internlm2_5-20b-chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use internlm/internlm2_5-20b-chat with Docker Model Runner:
```
docker model run hf.co/internlm/internlm2_5-20b-chat
```

Enhancing Accessibility and Market Reach through llama.cpp Support

by ChuckMcSneed - opened Aug 6, 2024

Discussion

ChuckMcSneed

Aug 6, 2024

Dear InternLM Team,

I hope this message finds you well. As we continue to push the boundaries of language model development, I would like to bring to your attention a crucial aspect that can significantly impact the adoption and popularity of your Large Language Models (LLMs). While achieving impressive benchmarks is indeed a remarkable accomplishment, it is equally essential to ensure that your models are accessible and usable by a broader audience.

In the lower market segment, where your LLMs are likely to have the most significant impact, the preferred method of running LLMs is through llama.cpp. This tool has become a de facto standard for many developers and users in this space. However, I noticed that your models currently lack support in llama.cpp.

I strongly recommend that the team allocates some effort to adding support in llama.cpp. By doing so, you will significantly enhance the accessibility and usability of your LLMs, making them more attractive to a wider range of users. This, in turn, will increase the likelihood of your models gaining popularity and widespread adoption.

In today's competitive landscape, it is not enough to simply have impressive benchmarks. To truly succeed, you must also prioritize the needs and preferences of your users. By supporting llama.cpp, you will demonstrate your commitment to making your LLMs usable by the people who need them most.

Thank you for your attention to this matter, and I look forward to seeing the positive impact that llama.cpp support will have on your LLMs.

Best regards,

Charles McSneed

XYZliang

Aug 11, 2024

Great suggestion, I think you should go to GitHub to provide feedback and get more attention from the official platform

xujfcn

Feb 24

Great discussion! For anyone wanting to quickly test this, Crazyrouter offers API access to this model. No infrastructure setup needed — just an API key and the standard OpenAI SDK.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment