Instructions to use leonardo-rocha/llama2-7b-hf-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use leonardo-rocha/llama2-7b-hf-chat with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("leonardo-rocha/llama2-7b-hf-chat") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use leonardo-rocha/llama2-7b-hf-chat with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "leonardo-rocha/llama2-7b-hf-chat"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "leonardo-rocha/llama2-7b-hf-chat" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "leonardo-rocha/llama2-7b-hf-chat", "messages": [ {"role": "user", "content": "Hello"} ] }'
Llama 2 7B Chat 4-bit
Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, in npz format suitable for use in Apple's MLX framework.
Weights have been converted to float16 from the original bfloat16 type, because numpy is not compatible with bfloat16 out of the box.
How to use with MLX.
# Install mlx, mlx-examples, huggingface-cli
pip install mlx
pip install huggingface_hub hf_transfer
git clone https://github.com/ml-explore/mlx-examples.git
# Download model
export HF_HUB_ENABLE_HF_TRANSFER=1
huggingface-cli download --local-dir Llama-2-7b-chat-mlx/ mlx-community/Llama-2-7b-chat-4-bit
# Run example
python mlx-examples/llms/llama/llama.py --prompt "My name is " --model-path Llama-2-7b-chat-mlx/
Please, refer to the original model card for details on Llama 2.
- Downloads last month
- 12