Instructions to use GritLM/GritLM-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use GritLM/GritLM-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="GritLM/GritLM-7B", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("GritLM/GritLM-7B", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("GritLM/GritLM-7B", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use GritLM/GritLM-7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "GritLM/GritLM-7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "GritLM/GritLM-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/GritLM/GritLM-7B
- SGLang
How to use GritLM/GritLM-7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "GritLM/GritLM-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "GritLM/GritLM-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "GritLM/GritLM-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "GritLM/GritLM-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use GritLM/GritLM-7B with Docker Model Runner:
docker model run hf.co/GritLM/GritLM-7B
difference in performence - AutoModel vs. Sentence transformence
Hi,
recently I checked the mteb benchmark (focused on the classifications benchmarks), and I got difference results when I used the model loaded with Automodel (and did last token pooling) than loaded the model through Sentencetransformer package (with the default config). Can someone help me figure this one up?
The model usage is documented here: https://github.com/ContextualAI/gritlm?tab=readme-ov-file#inference
It is not compatible with Sentence Transformers and does not use last token pooling, so these will lead to suboptimal performance.
So are you saying load the model with grit package as model = GritLM("GritLM/GritLM-7B", torch_dtype="auto")
should give Me the best results on mteb?
So are you saying load the model with grit package as model = GritLM("GritLM/GritLM-7B", torch_dtype="auto")
should give Me the best results on mteb?
Yes! You should be able to get the same results as GritLM-7B, you can e.g. use this script: https://github.com/ContextualAI/gritlm/blob/main/README.md#embedding
Thank you!
Actually, I’m looking for the right configuration to use this model loaded with Automodel and which pooling method should I use. I want to use the option of add past_key_values to my context, which is available in Automodel package. Do you familiar with such configuration?