Instructions to use abacusai/Smaug-Llama-3-70B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use abacusai/Smaug-Llama-3-70B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="abacusai/Smaug-Llama-3-70B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("abacusai/Smaug-Llama-3-70B-Instruct")
model = AutoModelForCausalLM.from_pretrained("abacusai/Smaug-Llama-3-70B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use abacusai/Smaug-Llama-3-70B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "abacusai/Smaug-Llama-3-70B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "abacusai/Smaug-Llama-3-70B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/abacusai/Smaug-Llama-3-70B-Instruct

SGLang

How to use abacusai/Smaug-Llama-3-70B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "abacusai/Smaug-Llama-3-70B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "abacusai/Smaug-Llama-3-70B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "abacusai/Smaug-Llama-3-70B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "abacusai/Smaug-Llama-3-70B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use abacusai/Smaug-Llama-3-70B-Instruct with Docker Model Runner:
```
docker model run hf.co/abacusai/Smaug-Llama-3-70B-Instruct
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Smaug-Llama-3-70B-Instruct

Built with Meta Llama 3

This model was built using a new Smaug recipe for improving performance on real world multi-turn conversations applied to meta-llama/Meta-Llama-3-70B-Instruct.

The model outperforms Llama-3-70B-Instruct substantially, and is on par with GPT-4-Turbo, on MT-Bench (see below).

EDIT: Smaug-Llama-3-70B-Instruct is the top open source model on Arena-Hard currently! It is also nearly on par with Claude Opus - see below.

We are conducting additional benchmark evaluations and will add those when available.

Model Description

Developed by: Abacus.AI
License: https://llama.meta.com/llama3/license/
Finetuned from model: meta-llama/Meta-Llama-3-70B-Instruct.

How to use

The prompt format is unchanged from Llama 3 70B Instruct.

Use with transformers

See the snippet below for usage with Transformers:

import transformers
import torch

model_id = "abacusai/Smaug-Llama-3-70B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

prompt = pipeline.tokenizer.apply_chat_template(
        messages, 
        tokenize=False, 
        add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])

Evaluation

Arena-Hard

Score vs selected others (sourced from: (https://lmsys.org/blog/2024-04-19-arena-hard/#full-leaderboard-with-gpt-4-turbo-as-judge)). GPT-4o and Gemini-1.5-pro-latest were missing from the original blob post, and we produced those numbers from a local run using the same methodology.

Model	Score	95% Confidence Interval	Average Tokens
GPT-4-Turbo-2024-04-09	82.6	(-1.8, 1.6)	662
GPT-4o	78.3	(-2.4, 2.1)	685
Gemini-1.5-pro-latest	72.1	(-2.3, 2.2)	630
Claude-3-Opus-20240229	60.4	(-3.3, 2.4)	541
Smaug-Llama-3-70B-Instruct	56.7	(-2.2, 2.6)	661
GPT-4-0314	50.0	(-0.0, 0.0)	423
Claude-3-Sonnet-20240229	46.8	(-2.1, 2.2)	552
Llama-3-70B-Instruct	41.1	(-2.5, 2.4)	583
GPT-4-0613	37.9	(-2.2, 2.0)	354
Mistral-Large-2402	37.7	(-1.9, 2.6)	400
Mixtral-8x22B-Instruct-v0.1	36.4	(-2.7, 2.9)	430
Qwen1.5-72B-Chat	36.1	(-2.5, 2.2)	474
Command-R-Plus	33.1	(-2.1, 2.2)	541
Mistral-Medium	31.9	(-2.3, 2.4)	485
GPT-3.5-Turbo-0613	24.8	(-1.6, 2.0)	401

MT-Bench

########## First turn ##########
                   score
model             turn
Smaug-Llama-3-70B-Instruct         1     9.40000                                                                                                                            
GPT-4-Turbo                        1     9.37500
Meta-Llama-3-70B-Instruct          1     9.21250 
########## Second turn ##########
                   score
model             turn
Smaug-Llama-3-70B-Instruct         2     9.0125
GPT-4-Turbo                        2     9.0000
Meta-Llama-3-70B-Instruct          2     8.8000
########## Average ##########
                 score
model
Smaug-Llama-3-70B-Instruct          9.206250
GPT-4-Turbo                         9.187500
Meta-Llama-3-70B-Instruct           9.006250

Model	First turn	Second Turn	Average
Smaug-Llama-3-70B-Instruct	9.40	9.01	9.21
GPT-4-Turbo	9.38	9.00	9.19
Meta-Llama-3-70B-Instruct	9.21	8.80	9.01

OpenLLM Leaderboard Manual Evaluation

Model	ARC	Hellaswag	MMLU	TruthfulQA	Winogrande	GSM8K*	Average
Smaug-Llama-3-70B-Instruct	70.6	86.1	79.2	62.5	83.5	90.5	78.7
Llama-3-70B-Instruct	71.4	85.7	80.0	61.8	82.9	91.1	78.8

GSM8K The GSM8K numbers quoted here are computed using a recent release of the LM Evaluation Harness. The commit used by the leaderboard has a significant issue that impacts models that tend to use : in their responses due to a bug in the stop word configuration for GSM8K. The issue is covered in more detail in this GSM8K evaluation discussion. The score for both Llama-3 and this model are significantly different when evaluated with the updated harness as the issue with stop words has been addressed.

This version of Smaug uses new techniques and new data compared to Smaug-72B, and more information will be released later on. For now, see the previous Smaug paper: https://arxiv.org/abs/2402.13228.

Downloads last month: 7,914

Safetensors

Model size

71B params

Tensor type

BF16

Model tree for abacusai/Smaug-Llama-3-70B-Instruct

Merges

13 models

Quantizations

3 models

Datasets used to train abacusai/Smaug-Llama-3-70B-Instruct

Spaces using abacusai/Smaug-Llama-3-70B-Instruct 11

Paper for abacusai/Smaug-Llama-3-70B-Instruct

Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

Paper • 2402.13228 • Published Feb 20, 2024 • 3