Instructions to use internlm/internlm2_5-20b-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use internlm/internlm2_5-20b-chat with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="internlm/internlm2_5-20b-chat", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("internlm/internlm2_5-20b-chat", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use internlm/internlm2_5-20b-chat with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "internlm/internlm2_5-20b-chat" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "internlm/internlm2_5-20b-chat", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/internlm/internlm2_5-20b-chat
- SGLang
How to use internlm/internlm2_5-20b-chat with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "internlm/internlm2_5-20b-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "internlm/internlm2_5-20b-chat", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "internlm/internlm2_5-20b-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "internlm/internlm2_5-20b-chat", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use internlm/internlm2_5-20b-chat with Docker Model Runner:
docker model run hf.co/internlm/internlm2_5-20b-chat
Enhancing Accessibility and Market Reach through llama.cpp Support
Dear InternLM Team,
I hope this message finds you well. As we continue to push the boundaries of language model development, I would like to bring to your attention a crucial aspect that can significantly impact the adoption and popularity of your Large Language Models (LLMs). While achieving impressive benchmarks is indeed a remarkable accomplishment, it is equally essential to ensure that your models are accessible and usable by a broader audience.
In the lower market segment, where your LLMs are likely to have the most significant impact, the preferred method of running LLMs is through llama.cpp. This tool has become a de facto standard for many developers and users in this space. However, I noticed that your models currently lack support in llama.cpp.
I strongly recommend that the team allocates some effort to adding support in llama.cpp. By doing so, you will significantly enhance the accessibility and usability of your LLMs, making them more attractive to a wider range of users. This, in turn, will increase the likelihood of your models gaining popularity and widespread adoption.
In today's competitive landscape, it is not enough to simply have impressive benchmarks. To truly succeed, you must also prioritize the needs and preferences of your users. By supporting llama.cpp, you will demonstrate your commitment to making your LLMs usable by the people who need them most.
Thank you for your attention to this matter, and I look forward to seeing the positive impact that llama.cpp support will have on your LLMs.
Best regards,
Charles McSneed
Great suggestion, I think you should go to GitHub to provide feedback and get more attention from the official platform
Great discussion! For anyone wanting to quickly test this, Crazyrouter offers API access to this model. No infrastructure setup needed β just an API key and the standard OpenAI SDK.