Instructions to use majentik/garden with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use majentik/garden with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="majentik/garden")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("majentik/garden", dtype="auto")

MLX

How to use majentik/garden with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm
# if on a CUDA device, also pip install mlx[cuda]

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("majentik/garden")

prompt = "Once upon a time in"
text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

vLLM

How to use majentik/garden with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "majentik/garden"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "majentik/garden",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/majentik/garden

SGLang

How to use majentik/garden with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "majentik/garden" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "majentik/garden",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "majentik/garden" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "majentik/garden",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

MLX LM

How to use majentik/garden with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Generate some text
mlx_lm.generate --model "majentik/garden" --prompt "Once upon a time"

Docker Model Runner
How to use majentik/garden with Docker Model Runner:
```
docker model run hf.co/majentik/garden
```

majentik — Model Garden

A curated collection of quantized open-weight models with inference-time KV-cache compression. Every model keeps upstream tokenizers and architectures; the only thing we change is how the weights and KV cache are stored during generation.

307 repositories · 12 families · 6 quantization lanes

What this garden is for

Running bigger models on the laptop you already have. Every release combines a standard weight-quantization format (GGUF, MLX, or AWQ) with one of two KV-cache compressors:

Compressor	What it does	When to use
RotorQuant	Rotational isotropic KV-cache compression	Long-context work; 2–4× KV memory savings with minimal drift
TurboQuant	Turbo-variant targeted at throughput	Short-context, high-throughput serving

Both compressors are applied at inference time. They compose with any weight-quantized file in this garden — you mix and match.

Families

Family	Repos	Notes
Gemma 4	127	E2B / E4B / 26B-A4B / 31B, base + instruct
Nemotron	41	Nano 4B + Super (Thinking + Base) variants
Qwen 3.5	28	27B dense + 397B-A17B MoE
GPT-OSS	28	20B and 120B
Voxtral	24	ASR + voice chat, 3 sub-families
MERaLiON	30	2 (20 repos) and 3 (10 repos) — ASR + multimodal
MiniMax M2.7	9	Mixed quantization lanes
Mistral Small 4	8	Instruct + reasoning
Leanstral	8	Distilled Mistral reasoning variant
DeepSeek V3.2	2	Mostly upstream, KV-quant wrappers

Quantization lanes

Every model lands in one or more of these lanes (the README for each repo specifies which):

GGUF — Q2_K, Q3_K_M, IQ4_XS, Q4_K_M, Q5_K_M, Q8_0. Load with llama.cpp, ollama, lmstudio, or any GGUF-compatible runtime.
MLX — 2-bit, 4-bit, 8-bit. Targets Apple Silicon. Pip install mlx-lm, point it at the repo, done.
AWQ — 4-bit and 8-bit. Targets CUDA GPUs with vLLM or autoawq.

Pick a starting point

I have a MacBook with 16 GB RAM → try a 4-bit MLX variant of Gemma 4 E4B.
I have a 24 GB GPU → try an AWQ-4bit Qwen3.5-27B with RotorQuant.
I need 128k context on modest hardware → any GGUF + RotorQuant.
I want to compare → each repo card links to the corresponding upstream model, so perplexity drift is one eval_hf.py run away.

What is not in each repo

Training data. These are quantization-only releases. The base model's training data is upstream's concern; we inherit the upstream license and disclaimers.
Benchmarks for every axis. We publish per-lane perplexity on WikiText-2 plus family-specific evals (MMLU for reasoning, LibriSpeech for ASR). If you want an axis we haven't measured yet, open a discussion on the relevant repo.

Who we are

majentik publishes these as a side-project to keep our own fleet running cheaply on commodity hardware, and to close the gap between research releases and "can I actually run this tonight". Issues, quant requests, and benchmark PRs are welcome.

Contact

Discussions: use the Community tab on the specific model repo.
Hardware donations / compute partnerships: majentik on Hugging Face.
Everything else: open a discussion on the closest repo, we'll see it.

Versioning

Each repo version tracks upstream@base-model-revision × quant-lane. When upstream ships a new base revision, we re-run the quant lane and bump the repo version. Card changes (docs, benchmarks) do not bump the version.

License

Each repo inherits the base model's license, not this organization-level license. Check the license field in the repository's card before deploying.

Downloads last month: -; Downloads are not tracked for this model. How to track