Instructions to use Undi95/Meta-Llama-3-70B-Instruct-hf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Undi95/Meta-Llama-3-70B-Instruct-hf with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Undi95/Meta-Llama-3-70B-Instruct-hf") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Undi95/Meta-Llama-3-70B-Instruct-hf") model = AutoModelForCausalLM.from_pretrained("Undi95/Meta-Llama-3-70B-Instruct-hf") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Undi95/Meta-Llama-3-70B-Instruct-hf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Undi95/Meta-Llama-3-70B-Instruct-hf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Undi95/Meta-Llama-3-70B-Instruct-hf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Undi95/Meta-Llama-3-70B-Instruct-hf
- SGLang
How to use Undi95/Meta-Llama-3-70B-Instruct-hf with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Undi95/Meta-Llama-3-70B-Instruct-hf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Undi95/Meta-Llama-3-70B-Instruct-hf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Undi95/Meta-Llama-3-70B-Instruct-hf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Undi95/Meta-Llama-3-70B-Instruct-hf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Undi95/Meta-Llama-3-70B-Instruct-hf with Docker Model Runner:
docker model run hf.co/Undi95/Meta-Llama-3-70B-Instruct-hf
"We took great care to optimize helpfulness and safety."
Sounds like this is gonna be one Undi-Incompatible model full of censorship
At least it isn't codellama-70b-instruct level of "safety" - so safe it didn't want to write any code :D
Sounds like this is gonna be one Undi-Incompatible model full of censorship
Judging by Reddit, on the contrary, even with the assistant prompt, it does not refuse a large number of requests that Llama 2 would never answer.
The level of censorship is noticeably lower than in Llama 2. And there are also few refusals in sillytavern with jailbreak.
I've been doing some testing myself after starting the thread. Jailbreaking seems to let it loose, though sometimes, after perhaps 200 or so tokens output, it can suddenly refuse and keep echoing its refusal. In my personal opinion, MiquMaid-v3-70B beats it for the things I play with, didn't find it to be smart at all.