Instructions to use mmrech/Minimalism with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mmrech/Minimalism with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("mmrech/Minimalism") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use mmrech/Minimalism with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "mmrech/Minimalism" --prompt "Once upon a time"
Minimalism Usage
Quick Start
1. Install dependencies
pip install mlx-lm
2. Start the server
# Using the base model with this adapter
python -m mlx_lm.server \
--model mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit \
--adapter-path . \
--host 127.0.0.1 \
--port 8080
3. Test with curl
curl http://127.0.0.1:8080/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "Minimalism",
"messages": [
{"role": "user", "content": "Write a Python function to add two numbers"}
],
"max_tokens": 256
}'
Response Format
Minimalism provides runnable-first responses with these sections:
- Solution: Main implementation
- Usage: Smallest runnable example
- Sanity test: Tiny test snippet (when appropriate)