Instructions to use GritLM/GritLM-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use GritLM/GritLM-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="GritLM/GritLM-7B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("GritLM/GritLM-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("GritLM/GritLM-7B", trust_remote_code=True, device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use GritLM/GritLM-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "GritLM/GritLM-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GritLM/GritLM-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/GritLM/GritLM-7B

SGLang

How to use GritLM/GritLM-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "GritLM/GritLM-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GritLM/GritLM-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "GritLM/GritLM-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GritLM/GritLM-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use GritLM/GritLM-7B with Docker Model Runner:
```
docker model run hf.co/GritLM/GritLM-7B
```

difference in performence - AutoModel vs. Sentence transformence

by yearivig - opened Jun 9, 2024

Discussion

yearivig

Jun 9, 2024

Hi,
recently I checked the mteb benchmark (focused on the classifications benchmarks), and I got difference results when I used the model loaded with Automodel (and did last token pooling) than loaded the model through Sentencetransformer package (with the default config). Can someone help me figure this one up?

Muennighoff

GritLM org Jun 9, 2024

The model usage is documented here: https://github.com/ContextualAI/gritlm?tab=readme-ov-file#inference
It is not compatible with Sentence Transformers and does not use last token pooling, so these will lead to suboptimal performance.

yearivig

Jun 9, 2024

•

edited Jun 9, 2024

So are you saying load the model with grit package as model = GritLM("GritLM/GritLM-7B", torch_dtype="auto")
should give Me the best results on mteb?

Muennighoff

GritLM org Jun 9, 2024

So are you saying load the model with grit package as model = GritLM("GritLM/GritLM-7B", torch_dtype="auto")
should give Me the best results on mteb?

Yes! You should be able to get the same results as GritLM-7B, you can e.g. use this script: https://github.com/ContextualAI/gritlm/blob/main/README.md#embedding

yearivig

Jun 9, 2024

Thank you!
Actually, I’m looking for the right configuration to use this model loaded with Automodel and which pooling method should I use. I want to use the option of add past_key_values to my context, which is available in Automodel package. Do you familiar with such configuration?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment