Instructions to use ChocoLlama/ChocoLlama-2-7B-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ChocoLlama/ChocoLlama-2-7B-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ChocoLlama/ChocoLlama-2-7B-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ChocoLlama/ChocoLlama-2-7B-instruct")
model = AutoModelForCausalLM.from_pretrained("ChocoLlama/ChocoLlama-2-7B-instruct", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ChocoLlama/ChocoLlama-2-7B-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ChocoLlama/ChocoLlama-2-7B-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ChocoLlama/ChocoLlama-2-7B-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ChocoLlama/ChocoLlama-2-7B-instruct

SGLang

How to use ChocoLlama/ChocoLlama-2-7B-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ChocoLlama/ChocoLlama-2-7B-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ChocoLlama/ChocoLlama-2-7B-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ChocoLlama/ChocoLlama-2-7B-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ChocoLlama/ChocoLlama-2-7B-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ChocoLlama/ChocoLlama-2-7B-instruct with Docker Model Runner:
```
docker model run hf.co/ChocoLlama/ChocoLlama-2-7B-instruct
```

Help needed with ChocoLlama-Instruct models

by FrancescoPeriti - opened Feb 19, 2025

Discussion

FrancescoPeriti

Feb 19, 2025

Hello,
thanks for sharing this model! I have been trying to use it with both the provided code and other implementations, but unfortunately, I am encountering some issues. Specifically, the inference time for a single example is excessively long, to the point that I have had to terminate the execution. I have tested the model in different environments and even attempted fine-tuning, but the problem persists.

While I have had success fine-tuning other pre-trained models (like llama2chat and llama3instruct), the fine-tuned versions of chocollama*-instruct remain slow and produce erratic outputs, which appear to be a result of tokenization errors. I am wondering if there could be an issue during the model loading process. Could you kindly double-check the model or guide me further on how to use it? I have experienced the same issue with both this model and Llama-3-ChocoLlama-8B-Instruct.

Thank you in advance for your time and support!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment