cturan commited on
Commit
8f2fb93
·
verified ·
1 Parent(s): a44c65b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -25
README.md CHANGED
@@ -5,32 +5,94 @@ library_name: transformers
5
  base_model:
6
  - MiniMaxAI/MiniMax-M2
7
  ---
8
- Test gguf for this model, will not work with standart llama.cpp, this is just experimental, https://github.com/cturan/llama.cpp/tree/minimax compile this.
9
-
10
- for example
11
-
12
- Ubuntu 22.04 cuda:
13
- wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
14
- sudo dpkg -i cuda-keyring_1.1-1_all.deb
15
- sudo apt-get update
16
- sudo apt-get -y install cuda-toolkit-12-8
17
- export CUDA_HOME=/usr/local/cuda
18
- export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
19
- export PATH=$PATH:$CUDA_HOME/bin
20
- apt install cmake
21
-
22
- git clone --branch minimax --single-branch https://github.com/cturan/llama.cpp.git
23
- cd llama.cpp
24
- mkdir build
25
- cd build
26
- cmake .. -DLLAMA_CUDA=ON -DLLAMA_CURL=OFF
27
- cmake --build . --config Release --parallel $(nproc --all)
28
-
29
- all done now you have binaries in llama.cpp/build/bin
30
-
31
- run it like
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ./llama-server -m minimax-m2-Q4_K.gguf -ngl 999 --cpu-moe --jinja -fa on -c 32000 --reasoning-format auto
 
 
 
 
 
 
 
34
 
35
- this will offload experts to cpu so you just need 16gb vram
 
 
 
 
 
36
 
 
 
5
  base_model:
6
  - MiniMaxAI/MiniMax-M2
7
  ---
8
+ # Building and Running the Experimental `minimax` Branch of `llama.cpp`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
+ **Note:**
11
+ This setup is experimental. The `minimax` branch will not work with the standard `llama.cpp`. Use it only for testing GGUF models with experimental features.
12
+
13
+ ---
14
+
15
+ ## System Requirements
16
+
17
+ - Ubuntu 22.04
18
+ - NVIDIA GPU with CUDA support
19
+ - CUDA Toolkit 12.8 or later
20
+ - CMake
21
+
22
+ ---
23
+
24
+ ## Installation Steps
25
+
26
+ ### 1. Install CUDA Toolkit 12.8
27
+
28
+ ```bash
29
+ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
30
+ sudo dpkg -i cuda-keyring_1.1-1_all.deb
31
+ sudo apt-get update
32
+ sudo apt-get -y install cuda-toolkit-12-8
33
+ ```
34
+
35
+ ### 2. Set Environment Variables
36
+
37
+ ```bash
38
+ export CUDA_HOME=/usr/local/cuda
39
+ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
40
+ export PATH=$PATH:$CUDA_HOME/bin
41
+ ```
42
+
43
+ ### 3. Install Build Tools
44
+
45
+ ```bash
46
+ sudo apt install cmake
47
+ ```
48
+
49
+ ### 4. Clone the Experimental Branch
50
+
51
+ ```bash
52
+ git clone --branch minimax --single-branch https://github.com/cturan/llama.cpp.git
53
+ cd llama.cpp
54
+ ```
55
+
56
+ ### 5. Build the Project
57
+
58
+ ```bash
59
+ mkdir build
60
+ cd build
61
+ cmake .. -DLLAMA_CUDA=ON -DLLAMA_CURL=OFF
62
+ cmake --build . --config Release --parallel $(nproc --all)
63
+ ```
64
+
65
+ ---
66
+
67
+ ## Build Output
68
+
69
+ After the build is complete, the binaries will be located in:
70
+
71
+ ```
72
+ llama.cpp/build/bin
73
+ ```
74
+
75
+ ---
76
+
77
+ ## Running the Model
78
+
79
+ Example command:
80
+
81
+ ```bash
82
  ./llama-server -m minimax-m2-Q4_K.gguf -ngl 999 --cpu-moe --jinja -fa on -c 32000 --reasoning-format auto
83
+ ```
84
+
85
+ This configuration offloads the experts to the CPU, so approximately 16 GB of VRAM is sufficient.
86
+
87
+ ---
88
+
89
+ ## Notes
90
 
91
+ - `--cpu-moe` enables CPU offloading for mixture-of-experts layers.
92
+ - `--jinja` activates the Jinja templating engine.
93
+ - Adjust `-c` (context length) and `-ngl` (GPU layers) according to your hardware.
94
+ - Ensure the model file (`minimax-m2-Q4_K.gguf`) is available in the working directory.
95
+
96
+ ---
97
 
98
+ All steps complete. The experimental CUDA-enabled build of `llama.cpp` is ready to use.