cturan
/

MiniMax-M2-GGUF

@@ -5,32 +5,94 @@ library_name: transformers
 base_model:
 - MiniMaxAI/MiniMax-M2
 ---
-Test gguf for this model, will not work with standart llama.cpp, this is just experimental,  https://github.com/cturan/llama.cpp/tree/minimax compile this.
-for example
-Ubuntu 22.04 cuda:
-  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
-  sudo dpkg -i cuda-keyring_1.1-1_all.deb
-  sudo apt-get update
-  sudo apt-get -y install cuda-toolkit-12-8
-  export CUDA_HOME=/usr/local/cuda
-  export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
-  export PATH=$PATH:$CUDA_HOME/bin
-  apt install cmake
-  git clone --branch minimax --single-branch https://github.com/cturan/llama.cpp.git
-  cd llama.cpp
-  mkdir build
-  cd build
-  cmake .. -DLLAMA_CUDA=ON  -DLLAMA_CURL=OFF
-  cmake --build . --config Release --parallel $(nproc --all)
-all done now you have binaries in llama.cpp/build/bin
-run it like
 ./llama-server -m minimax-m2-Q4_K.gguf -ngl 999 --cpu-moe --jinja -fa on -c 32000 --reasoning-format auto
-this will offload experts to cpu so you just need 16gb vram

 base_model:
 - MiniMaxAI/MiniMax-M2
 ---
+# Building and Running the Experimental `minimax` Branch of `llama.cpp`
+**Note:**
+This setup is experimental. The `minimax` branch will not work with the standard `llama.cpp`. Use it only for testing GGUF models with experimental features.
+---
+## System Requirements
+- Ubuntu 22.04
+- NVIDIA GPU with CUDA support
+- CUDA Toolkit 12.8 or later
+- CMake
+---
+## Installation Steps
+### 1. Install CUDA Toolkit 12.8
+```bash
+wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
+sudo dpkg -i cuda-keyring_1.1-1_all.deb
+sudo apt-get update
+sudo apt-get -y install cuda-toolkit-12-8
+```
+### 2. Set Environment Variables
+```bash
+export CUDA_HOME=/usr/local/cuda
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
+export PATH=$PATH:$CUDA_HOME/bin
+```
+### 3. Install Build Tools
+```bash
+sudo apt install cmake
+```
+### 4. Clone the Experimental Branch
+```bash
+git clone --branch minimax --single-branch https://github.com/cturan/llama.cpp.git
+cd llama.cpp
+```
+### 5. Build the Project
+```bash
+mkdir build
+cd build
+cmake .. -DLLAMA_CUDA=ON -DLLAMA_CURL=OFF
+cmake --build . --config Release --parallel $(nproc --all)
+```
+---
+## Build Output
+After the build is complete, the binaries will be located in:
+```
+llama.cpp/build/bin
+```
+---
+## Running the Model
+Example command:
+```bash
 ./llama-server -m minimax-m2-Q4_K.gguf -ngl 999 --cpu-moe --jinja -fa on -c 32000 --reasoning-format auto
+```
+This configuration offloads the experts to the CPU, so approximately 16 GB of VRAM is sufficient.
+---
+## Notes
+- `--cpu-moe` enables CPU offloading for mixture-of-experts layers.
+- `--jinja` activates the Jinja templating engine.
+- Adjust `-c` (context length) and `-ngl` (GPU layers) according to your hardware.
+- Ensure the model file (`minimax-m2-Q4_K.gguf`) is available in the working directory.
+---
+All steps complete. The experimental CUDA-enabled build of `llama.cpp` is ready to use.