---
license: apache-2.0
base_model: flwrlabs/Lizzy-7B
tags:
  - llama-cpp
  - gguf
  - olmo2
  - quantized
  - uk-english
  - agentic
  - function-calling
  - lizzy-7B
language:
  - en
pipeline_tag: text-generation
---

# Lizzy-7B GGUF Quants

> 🚨 **Update:** Flower Labs has officially released their native GGUF quants.
> I highly recommend transitioning to their repository for the most stable inference and the corrected 32k context window: **[flwrlabs/Lizzy-7B-GGUF](https://huggingface.co/flwrlabs/Lizzy-7B-GGUF)**.
> 
> *Note: During testing, I came across a bug with rope/context length issue, which has been patched in the official release. Thanks to the 250+ community members who tested this early build!*

**Quantized by [SolusOps](https://huggingface.co/SolusOps)** 

**Original model:** [FlowerLabs/Lizzy-7B](https://huggingface.co/flwrlabs/Lizzy-7B)

**Official Quants:** [flwrlabs/Lizzy-7B-GGUF](https://huggingface.co/flwrlabs/Lizzy-7B-GGUF)

## About This Repo

This repository provides llama.cpp-compatible GGUF quants of **Lizzy-7B**, a UK-centric 7B language model built by [Flower Labs](https://flower.ai).
Refer to the [original model card](https://huggingface.co/flwrlabs/Lizzy-7B) for more details on the model.

## Available Quants

| File | Quant | Size | Use Case |
|---|---|---|---|
| `Lizzy-7B-f16.gguf` | F16 | ~14.6 GB | needs 20GB+ VRAM or CPU offload. |
| `Lizzy-7B-Q8_0.gguf` | Q8_0 | ~7.7 GB | **Recommended** fits 12GB VRAM with excellent context headroom. |
| `Lizzy-7B-Q6_K.gguf` | Q6_K | ~5.9 GB | for 10GB–12GB GPUs looking to maximize context size. |
| `Lizzy-7B-Q5_K_M.gguf` | Q5_K_M | ~5.1 GB | 8GB VRAM |
| `Lizzy-7B-Q4_K_M.gguf` | Q4_K_M | ~4.1 GB | 6GB–8GB GPUs. |
| `Lizzy-7B-Q3_K_M.gguf` | Q3_K_M | ~3.5 GB | edge devices, 4GB GPUs, or older laptops. |

## Hardware Tested

| Hardware | Quant | n_ctx | Speed |
|---|---|---|---|
| RTX 3060 12GB | Q8_0 | 8192 | ~23 tok/s |
| RTX 3060 12GB | F16 | 4096 | Slower (VRAM overflow to RAM) |

## Conversion Notes

### 1. Architecture: OLMo 2 Post-Norm Tensor Mapping

Lizzy-7B uses a Post-Norm variant of OLMo 2.
The standard convert_hf_to_gguf.py script does not recognise Flower Labs tensor naming conventions (post_attn_norm, post_mlp_norm) 
and will fail or silently produce a broken file.
The fix was to register a LizzyForCausalLM model class in the llama.cpp conversion script,
subclassing Olmo2Model and overriding modify_tensors() to remap the four divergent tensor names:

```
python@ModelBase.register("LizzyForCausalLM")
class LizzyModel(Olmo2Model):
    def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:

        # 1. Lizzy: post_attn_norm -> llama.cpp: post_attention_norm
        if name.endswith(".post_attn_norm.weight"):
            yield (f"blk.{bid}.post_attention_norm.weight", data_torch)
            return

        # 2. Lizzy: post_mlp_norm -> llama.cpp: post_ffw_norm
        if name.endswith(".post_mlp_norm.weight"):
            yield (f"blk.{bid}.post_ffw_norm.weight", data_torch)
            return

        # 3. QK-Norms these mapped correctly via standard paths
        if name.endswith(".q_norm.weight"):
            yield (self.format_tensor_name(gguf.MODEL_TENSOR.ATTN_Q_NORM, bid), data_torch)
            return

        if name.endswith(".k_norm.weight"):
            yield (self.format_tensor_name(gguf.MODEL_TENSOR.ATTN_K_NORM, bid), data_torch)
            return

        # 4. All other tensors — pass through normally
        yield from super().modify_tensors(data_torch, name, bid)

```
     
No weights were altered. Only the tensor name metadata was remapped.

### 2. RoPE Scaling Factor Correction

During conversion, the script raised this warning:

```
The explicitly set RoPE scaling factor (config.rope_parameters['factor'] = 8.0)
does not match the ratio implicitly set by other parameters
(implicit factor = max_position_embeddings / original_max_position_embeddings = 4.0).
Using the explicit factor (8.0) in YaRN. This may cause unexpected behaviour.
```

The implicit factor (4.0) is mathematically derived from the model's own position embedding settings. The explicit `8.0` in the upstream config appears to be an authoring error. To produce a consistent and correctly-behaving GGUF, **the factor was corrected from `8.0` to `4.0`** in `config.json` before conversion.

This means the effective context window for these GGUFs reflects the 4.0× YaRN scaling, not 8.0×. If Flower Labs corrects the upstream config, a re-conversion would be straightforward.

## License

The original Lizzy-7B model is released under **Apache 2.0** by Flower Labs. These quants inherit that license.

## Links

- [Original Model: FlowerLabs/Lizzy-7B](https://huggingface.co/flwrlabs/Lizzy-7B)
- [Flower Labs](https://flower.ai)
- [llama.cpp](https://github.com/ggerganov/llama.cpp)

### About Me

This GGUF port was completed by **Anshuman Singh**.

* **GitHub:** [github.com/SolusOps](https://github.com/solusops)
* **LinkedIn:** [linkedin.com/in/anshumansingh2023](https://www.linkedin.com/in/anshumansingh2023/)

If this port helped your local deployment, feel free to connect!