Instructions to use mlx-community/Meta-Llama-3-8B-Instruct-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/Meta-Llama-3-8B-Instruct-4bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("mlx-community/Meta-Llama-3-8B-Instruct-4bit") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use mlx-community/Meta-Llama-3-8B-Instruct-4bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "mlx-community/Meta-Llama-3-8B-Instruct-4bit" --prompt "Once upon a time"
Colab crashing when inferencing the model
#4
by smfaizanahmed - opened
I tried this model on colab with and without GPU and also with high RAM, but it keeps crashing when inferencing is started.
smfaizanahmed changed discussion status to closed
smfaizanahmed changed discussion status to open