How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf devingulliver/mamba-gguf:
# Run inference directly in the terminal:
llama-cli -hf devingulliver/mamba-gguf:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf devingulliver/mamba-gguf:
# Run inference directly in the terminal:
llama-cli -hf devingulliver/mamba-gguf:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf devingulliver/mamba-gguf:
# Run inference directly in the terminal:
./llama-cli -hf devingulliver/mamba-gguf:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf devingulliver/mamba-gguf:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf devingulliver/mamba-gguf:
Use Docker
docker model run hf.co/devingulliver/mamba-gguf:
Quick Links

Mamba GGUF

These are the Mamba base models, converted to GGUF for use with llama.cpp, in a variety of precisions (2, 3, 4, 5, 6, 8, 16, and 32-bit).

Please click "Files and versions" at the top of the page to choose your desired model size, and then click the "📦LFS " button next to your desired quantization.

Here is a table adapted from TheBloke explaining the various precisions:

Quant method Use case
Q2_K significant quality loss - not recommended for most purposes
Q3_K_S very small, high quality loss
Q3_K_M very small, high quality loss
Q3_K_L small, substantial quality loss
Q4_0 legacy; small, very high quality loss - prefer using Q3_K_M
Q4_K_S small, greater quality loss
Q4_K_M medium, balanced quality - recommended
Q5_0 legacy; medium, balanced quality - prefer using Q4_K_M
Q5_K_S large, low quality loss - recommended
Q5_K_M large, very low quality loss - recommended
Q6_K very large, extremely low quality loss
Q8_0 very large, extremely low quality loss - not recommended
F16 half precision - almost identical to the original
F32 original precision - recommended by the Mamba authors
Downloads last month
461
GGUF
Model size
1B params
Architecture
mamba
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for devingulliver/mamba-gguf