Instructions to use mlx-community/CodeLlama-70b-hf-4bit-MLX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/CodeLlama-70b-hf-4bit-MLX with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("mlx-community/CodeLlama-70b-hf-4bit-MLX") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use mlx-community/CodeLlama-70b-hf-4bit-MLX with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "mlx-community/CodeLlama-70b-hf-4bit-MLX" --prompt "Once upon a time"
Invalid header in file model-00004-of-00008.safetensors
#1
by shanginn - opened
Hi there, hello. I tried the provided example, and got the following error:
Traceback (most recent call last):
File "project/app.py", line 3, in <module>
model, tokenizer = load("mlx-community/CodeLlama-70b-hf-4bit-MLX")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/mlx_lm/utils.py", line 279, in load
model = load_model(model_path)
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/mlx_lm/utils.py", line 220, in load_model
weights.update(mx.load(wf))
^^^^^^^^^^^
RuntimeError: [load] Invalid header in file ~/.cache/huggingface/hub/models--mlx-community--CodeLlama-70b-hf-4bit-MLX/snapshots/2fd732eb3a2685a22a03258b8191af356566c203/model-00004-of-00008.safetensors
I redownloaded the files multiple times with the same result. I also tried to download them by hands from Files pages using browser and manually put the files into the cache folder. But same error.
What am I doing wrong?