Instructions to use RedHatAI/granite-3.1-2b-instruct-FP8-dynamic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RedHatAI/granite-3.1-2b-instruct-FP8-dynamic with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RedHatAI/granite-3.1-2b-instruct-FP8-dynamic")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RedHatAI/granite-3.1-2b-instruct-FP8-dynamic")
model = AutoModelForCausalLM.from_pretrained("RedHatAI/granite-3.1-2b-instruct-FP8-dynamic")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use RedHatAI/granite-3.1-2b-instruct-FP8-dynamic with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RedHatAI/granite-3.1-2b-instruct-FP8-dynamic"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RedHatAI/granite-3.1-2b-instruct-FP8-dynamic",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/RedHatAI/granite-3.1-2b-instruct-FP8-dynamic

SGLang

How to use RedHatAI/granite-3.1-2b-instruct-FP8-dynamic with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RedHatAI/granite-3.1-2b-instruct-FP8-dynamic" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RedHatAI/granite-3.1-2b-instruct-FP8-dynamic",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RedHatAI/granite-3.1-2b-instruct-FP8-dynamic" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RedHatAI/granite-3.1-2b-instruct-FP8-dynamic",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use RedHatAI/granite-3.1-2b-instruct-FP8-dynamic with Docker Model Runner:
```
docker model run hf.co/RedHatAI/granite-3.1-2b-instruct-FP8-dynamic
```

granite-3.1-2b-instruct-FP8-dynamic

Commit History

Update README.md

38a3b7b
verified

shubhrapandit commited on Jan 28, 2025

Update README.md

4619ac5
verified

nm-research commited on Jan 25, 2025

Update README.md

1bcfafa
verified

nm-research commited on Jan 25, 2025

Update README.md

829c019
verified

nm-research commited on Jan 25, 2025

Update README.md

34b5056
verified

nm-research commited on Jan 20, 2025

Update README.md

c8b4ee2
verified

nm-research commited on Jan 20, 2025

Update README.md

6043ef6
verified

nm-research commited on Jan 16, 2025

Update README.md

492f5db
verified

nm-research commited on Jan 16, 2025

Update README.md

b758978
verified

nm-research commited on Jan 16, 2025

Add model files

5ad1702

Shubhra Pandit commited on Jan 15, 2025

Create README.md

d53b1b3
verified

nm-research commited on Jan 9, 2025

initial commit

851803b
verified

nm-research commited on Jan 7, 2025

Commit History

Update README.md 38a3b7b verified

Update README.md 4619ac5 verified

Update README.md 1bcfafa verified

Update README.md 829c019 verified

Update README.md 34b5056 verified

Update README.md c8b4ee2 verified

Update README.md 6043ef6 verified

Update README.md 492f5db verified

Update README.md b758978 verified

Add model files 5ad1702

Create README.md d53b1b3 verified

initial commit 851803b verified

Update README.md

38a3b7b
verified

Update README.md

4619ac5
verified

Update README.md

1bcfafa
verified

Update README.md

829c019
verified

Update README.md

34b5056
verified

Update README.md

c8b4ee2
verified

Update README.md

6043ef6
verified

Update README.md

492f5db
verified

Update README.md

b758978
verified

Add model files

5ad1702

Create README.md

d53b1b3
verified

initial commit

851803b
verified