Instructions to use Undi95/Meta-Llama-3-70B-Instruct-hf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Undi95/Meta-Llama-3-70B-Instruct-hf with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Undi95/Meta-Llama-3-70B-Instruct-hf")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Undi95/Meta-Llama-3-70B-Instruct-hf")
model = AutoModelForCausalLM.from_pretrained("Undi95/Meta-Llama-3-70B-Instruct-hf")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Undi95/Meta-Llama-3-70B-Instruct-hf with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Undi95/Meta-Llama-3-70B-Instruct-hf"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Undi95/Meta-Llama-3-70B-Instruct-hf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Undi95/Meta-Llama-3-70B-Instruct-hf

SGLang

How to use Undi95/Meta-Llama-3-70B-Instruct-hf with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Undi95/Meta-Llama-3-70B-Instruct-hf" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Undi95/Meta-Llama-3-70B-Instruct-hf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Undi95/Meta-Llama-3-70B-Instruct-hf" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Undi95/Meta-Llama-3-70B-Instruct-hf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Undi95/Meta-Llama-3-70B-Instruct-hf with Docker Model Runner:
```
docker model run hf.co/Undi95/Meta-Llama-3-70B-Instruct-hf
```

"We took great care to optimize helpfulness and safety."

by LLLLMAfficionado - opened Apr 18, 2024

Discussion

LLLLMAfficionado

Apr 18, 2024

Sounds like this is gonna be one Undi-Incompatible model full of censorship

jukofyork

Apr 19, 2024

At least it isn't codellama-70b-instruct level of "safety" - so safe it didn't want to write any code :D

Ainonake

Apr 19, 2024

Sounds like this is gonna be one Undi-Incompatible model full of censorship

Judging by Reddit, on the contrary, even with the assistant prompt, it does not refuse a large number of requests that Llama 2 would never answer.

The level of censorship is noticeably lower than in Llama 2. And there are also few refusals in sillytavern with jailbreak.

Undi95

Owner Apr 19, 2024

We already done a test finetune with @IkariDev and despite the model being dumb (we trained on base), I could do some hardcore shit (ahhh... test phase lmao) so I think it will be possible.

LLLLMAfficionado

Apr 19, 2024

I've been doing some testing myself after starting the thread. Jailbreaking seems to let it loose, though sometimes, after perhaps 200 or so tokens output, it can suddenly refuse and keep echoing its refusal. In my personal opinion, MiquMaid-v3-70B beats it for the things I play with, didn't find it to be smart at all.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment