ineso22 commited on
Commit
d8af6da
·
verified ·
1 Parent(s): e7c3ade

Delete docs/mlx_deploy_guide.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. docs/mlx_deploy_guide.md +0 -70
docs/mlx_deploy_guide.md DELETED
@@ -1,70 +0,0 @@
1
- ## MLX deployment guide
2
-
3
- Run, serve, and fine-tune [**MiniMax-M2.1**](https://huggingface.co/MiniMaxAI/MiniMax-M2.1) locally on your Mac using the **MLX** framework. This guide gets you up and running quickly.
4
-
5
- > **Requirements**
6
- > - Apple Silicon Mac (M3 Ultra or later)
7
- > - **At least 256GB of unified memory (RAM)**
8
-
9
-
10
- **Installation**
11
-
12
- Install the `mlx-lm` package via pip:
13
-
14
- ```bash
15
- pip install -U mlx-lm
16
- ```
17
-
18
- **CLI**
19
-
20
- Generate text directly from the terminal:
21
-
22
- ```bash
23
- mlx_lm.generate \
24
- --model mlx-community/MiniMax-M2.1-4bit \
25
- --prompt "How tall is Mount Everest?"
26
- ```
27
-
28
- > Add `--max-tokens 256` to control response length, or `--temp 0.7` for creativity.
29
-
30
- **Python Script Example**
31
-
32
- Use `mlx-lm` in your own Python scripts:
33
-
34
- ```python
35
- from mlx_lm import load, generate
36
-
37
- # Load the quantized model
38
- model, tokenizer = load("mlx-community/MiniMax-M2.1-4bit")
39
-
40
- prompt = "Hello, how are you?"
41
-
42
- # Apply chat template if available (recommended for chat models)
43
- if tokenizer.chat_template is not None:
44
- messages = [{"role": "user", "content": prompt}]
45
- prompt = tokenizer.apply_chat_template(
46
- messages,
47
- tokenize=False,
48
- add_generation_prompt=True
49
- )
50
-
51
- # Generate response
52
- response = generate(
53
- model,
54
- tokenizer,
55
- prompt=prompt,
56
- max_tokens=256,
57
- temp=0.7,
58
- verbose=True
59
- )
60
-
61
- print(response)
62
- ```
63
-
64
- **Tips**
65
- - **Model variants**: Check this [MLX community collection on Hugging Face](https://huggingface.co/collections/mlx-community/minimax-m2.1) for `MiniMax-M2.1-4bit`, `6bit`, `8bit`, or `bfloat16` versions.
66
- - **Fine-tuning**: Use `mlx-lm.lora` for efficient parameter-efficient fine-tuning (PEFT).
67
-
68
- **Resources**
69
- - GitHub: [https://github.com/ml-explore/mlx-lm](https://github.com/ml-explore/mlx-lm)
70
- - Models: [https://huggingface.co/mlx-community](https://huggingface.co/mlx-community)