Instructions to use flwrlabs/Lizzy-7B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use flwrlabs/Lizzy-7B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="flwrlabs/Lizzy-7B-GGUF",
	filename="lizzy-7b-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

llama-cpp-python

How to use flwrlabs/Lizzy-7B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="flwrlabs/Lizzy-7B-GGUF",
	filename="lizzy-7b-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use flwrlabs/Lizzy-7B-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf flwrlabs/Lizzy-7B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf flwrlabs/Lizzy-7B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf flwrlabs/Lizzy-7B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf flwrlabs/Lizzy-7B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf flwrlabs/Lizzy-7B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf flwrlabs/Lizzy-7B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf flwrlabs/Lizzy-7B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf flwrlabs/Lizzy-7B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/flwrlabs/Lizzy-7B-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use flwrlabs/Lizzy-7B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "flwrlabs/Lizzy-7B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "flwrlabs/Lizzy-7B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/flwrlabs/Lizzy-7B-GGUF:Q4_K_M

Ollama
How to use flwrlabs/Lizzy-7B-GGUF with Ollama:
```
ollama run hf.co/flwrlabs/Lizzy-7B-GGUF:Q4_K_M
```

Unsloth Studio

How to use flwrlabs/Lizzy-7B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for flwrlabs/Lizzy-7B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for flwrlabs/Lizzy-7B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for flwrlabs/Lizzy-7B-GGUF to start chatting

How to use flwrlabs/Lizzy-7B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf flwrlabs/Lizzy-7B-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "flwrlabs/Lizzy-7B-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use flwrlabs/Lizzy-7B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf flwrlabs/Lizzy-7B-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default flwrlabs/Lizzy-7B-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use flwrlabs/Lizzy-7B-GGUF with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf flwrlabs/Lizzy-7B-GGUF:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "flwrlabs/Lizzy-7B-GGUF:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use flwrlabs/Lizzy-7B-GGUF with Docker Model Runner:
```
docker model run hf.co/flwrlabs/Lizzy-7B-GGUF:Q4_K_M
```

Lemonade

How to use flwrlabs/Lizzy-7B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull flwrlabs/Lizzy-7B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Lizzy-7B-GGUF-Q4_K_M

List all available models

lemonade list

Lizzy 7B GGUF Quantized Models

Quantized GGUF models for efficient CPU/GPU inference

📊 Model Variants • 🚀 Quick Start • 📚 Documentation

Overview

This repository contains GGUF-quantized versions of the Lizzy 7B, a reasoning-enhanced language model from Flower Labs with British knowledge and behavior enhancements.

Model Variants

Quantization	File Size	Quality Retention	Recommended Use Case
Q5_K_M ⭐	4.8 GB	95%	Best balance of quality and size
Q4_K_M	4.2 GB	92%	Resource-constrained environments
Q8_0	7.2 GB	99%	Near-lossless compression
Q6_K	5.6 GB	97%	Between Q5 and Q8
f16	13.6 GB	100%	Maximum quality, benchmarking

Quick Start

Using llama.cpp (Recommended)

# Clone llama.cpp with Lizzy support
git clone https://github.com/relogu/llama.cpp.git
cd llama.cpp
git checkout lorenzo-dev

# Build with CUDA support
make LLAMA_CUDA=1

# Run inference with recommended Q5_K_M quantization
./main -m lizzy-7b-Q5_K_M.gguf \
       -p "What is the capital of England?" \
       -n 128 \
       --temp 0.6 \
       --top-p 0.95 \
       -ngl 32  # Offload all layers to GPU

Using Python (llama-cpp-python)

from llama_cpp import Llama

# Load model with GPU offload
llm = Llama(
    model_path="lizzy-7b-Q5_K_M.gguf",
    n_ctx=65536,  # Full context
    n_gpu_layers=32,  # Offload to GPU
    n_threads=8,
)

# Generate with reasoning
response = llm(
    "Explain why British people queue so much.",
    max_tokens=512,
    temperature=0.6,
    top_p=0.95,
)

print(response["choices"][0]["text"])

Using Ollama

# Create Modelfile
cat > Modelfile << EOF
FROM ./lizzy-7b-Q5_K_M.gguf
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER num_ctx 65536
EOF

# Create and run
ollama create lizzy -f Modelfile
ollama run lizzy "What's the best way to make tea?"

Usage with llama.cpp

Basic Inference

./main -m lizzy-7b-Q5_K_M.gguf \
       -p "User: Hello, assistant!\nAssistant:" \
       -n 256 \
       --temp 0.6 \
       --top-p 0.95 \
       -ngl 32

Chat Mode

./chat -m lizzy-7b-Q5_K_M.gguf \
       -ngl 32 \
       --temp 0.6 \
       --top-p 0.95

Server Mode (API)

./server -m lizzy-7b-Q5_K_M.gguf \
         -ngl 32 \
         --port 8080 \
         --host 0.0.0.0

Then access at http://localhost:8080 or use the API:

curl http://localhost:8080/completion \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Why do British people say sorry so often?",
    "n_predict": 256,
    "temperature": 0.6,
    "top_p": 0.95
  }'

Reasoning Behavior

Lizzy 7B is a reasoning model that uses thinking tokens. You'll see output like:

> Let me think about this question about British culture...
> The user is asking about queuing behavior...
> I should explain the cultural significance...

British people queue because it reflects core cultural values of fairness and order...

This is expected behavior - the > prefix indicates the model's reasoning process before providing the final answer.

Documentation

The following sections provide comprehensive documentation for using Lizzy 7B GGUF models.

Architecture Details

Base: Lizzy 7B
Layers: 32 (with post-norm architecture)
Hidden size: 4096
Attention: Sliding window (4096) + full attention
RoPE: YaRN scaling (factor=8.0, original=8192)
Vocab: 100,278 tokens
Context: 65,536 tokens
Tensors: 355 (including attn_post_norm and ffn_post_norm)

Model Comparison

When to Use GGUF vs. Original Format

Use GGUF when:

✅ You need CPU inference
✅ You want flexible GPU offloading
✅ You need smaller model size
✅ You're using llama.cpp ecosystem
✅ You want fast loading times

Use original Safetensors when:

✅ You need full precision (BF16)
✅ You're using transformers/vLLM
✅ You need tensor parallelism
✅ You're fine-tuning the model

License

These GGUF models are derived from the Lizzy 7B. Please refer to the base model license for redistribution terms.

Base Model: flwrlabs/Lizzy-7B

Citation

If you use Lizzy 7B in your research, please cite:

@model{lizzy-7b-gguf,
  title = {Lizzy 7B},
  author = {Flower Labs},
  year = {2026},
  url = {https://huggingface.co/flwrlabs/Lizzy-7B-GGUF}
}

Support

📚 Documentation: See HuggingFace repository files
🐛 Issues: Report on HuggingFace
💬 Discussions: HuggingFace community forum

Developed by Flower Labs

🌸 Flower Labs | 📖 Documentation | 💬 Discuss

Downloads last month: 117

GGUF

Model size

7B params

Architecture

lizzy

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

View +1 variant