Instructions to use mlx-community/Mistral-7B-Instruct-v0.3-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/Mistral-7B-Instruct-v0.3-4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Mistral-7B-Instruct-v0.3-4bit mlx-community/Mistral-7B-Instruct-v0.3-4bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Limit of tokens
#4
by francostan - opened
Hi everyone, im facing this problems that all the ai response generated through generate() are limited on 256 tokens:
Prompt: 707 tokens, 180.446 tokens-per-sec
Generation: 256 tokens, 20.440 tokens-per-sec
Peak memory: 4.440 GB
Respuesta generada:
...
Anyone know how to change this limit, should be on load() ?