Lizzy-7B-GGUF / README.md
SolusOps's picture
Update README.md
62385b3 verified
---
license: apache-2.0
base_model: flwrlabs/Lizzy-7B
tags:
- llama-cpp
- gguf
- olmo2
- quantized
- uk-english
- agentic
- function-calling
- lizzy-7B
language:
- en
pipeline_tag: text-generation
---
# Lizzy-7B GGUF Quants
> 🚨 **Update:** Flower Labs has officially released their native GGUF quants.
> I highly recommend transitioning to their repository for the most stable inference and the corrected 32k context window: **[flwrlabs/Lizzy-7B-GGUF](https://huggingface.co/flwrlabs/Lizzy-7B-GGUF)**.
>
> *Note: During testing, I came across a bug with rope/context length issue, which has been patched in the official release. Thanks to the 250+ community members who tested this early build!*
**Quantized by [SolusOps](https://huggingface.co/SolusOps)**
**Original model:** [FlowerLabs/Lizzy-7B](https://huggingface.co/flwrlabs/Lizzy-7B)
**Official Quants:** [flwrlabs/Lizzy-7B-GGUF](https://huggingface.co/flwrlabs/Lizzy-7B-GGUF)
## About This Repo
This repository provides llama.cpp-compatible GGUF quants of **Lizzy-7B**, a UK-centric 7B language model built by [Flower Labs](https://flower.ai).
Refer to the [original model card](https://huggingface.co/flwrlabs/Lizzy-7B) for more details on the model.
## Available Quants
| File | Quant | Size | Use Case |
|---|---|---|---|
| `Lizzy-7B-f16.gguf` | F16 | ~14.6 GB | needs 20GB+ VRAM or CPU offload. |
| `Lizzy-7B-Q8_0.gguf` | Q8_0 | ~7.7 GB | **Recommended** fits 12GB VRAM with excellent context headroom. |
| `Lizzy-7B-Q6_K.gguf` | Q6_K | ~5.9 GB | for 10GB–12GB GPUs looking to maximize context size. |
| `Lizzy-7B-Q5_K_M.gguf` | Q5_K_M | ~5.1 GB | 8GB VRAM |
| `Lizzy-7B-Q4_K_M.gguf` | Q4_K_M | ~4.1 GB | 6GB–8GB GPUs. |
| `Lizzy-7B-Q3_K_M.gguf` | Q3_K_M | ~3.5 GB | edge devices, 4GB GPUs, or older laptops. |
## Hardware Tested
| Hardware | Quant | n_ctx | Speed |
|---|---|---|---|
| RTX 3060 12GB | Q8_0 | 8192 | ~23 tok/s |
| RTX 3060 12GB | F16 | 4096 | Slower (VRAM overflow to RAM) |
## Conversion Notes
### 1. Architecture: OLMo 2 Post-Norm Tensor Mapping
Lizzy-7B uses a Post-Norm variant of OLMo 2.
The standard convert_hf_to_gguf.py script does not recognise Flower Labs tensor naming conventions (post_attn_norm, post_mlp_norm)
and will fail or silently produce a broken file.
The fix was to register a LizzyForCausalLM model class in the llama.cpp conversion script,
subclassing Olmo2Model and overriding modify_tensors() to remap the four divergent tensor names:
```
python@ModelBase.register("LizzyForCausalLM")
class LizzyModel(Olmo2Model):
def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:
# 1. Lizzy: post_attn_norm -> llama.cpp: post_attention_norm
if name.endswith(".post_attn_norm.weight"):
yield (f"blk.{bid}.post_attention_norm.weight", data_torch)
return
# 2. Lizzy: post_mlp_norm -> llama.cpp: post_ffw_norm
if name.endswith(".post_mlp_norm.weight"):
yield (f"blk.{bid}.post_ffw_norm.weight", data_torch)
return
# 3. QK-Norms these mapped correctly via standard paths
if name.endswith(".q_norm.weight"):
yield (self.format_tensor_name(gguf.MODEL_TENSOR.ATTN_Q_NORM, bid), data_torch)
return
if name.endswith(".k_norm.weight"):
yield (self.format_tensor_name(gguf.MODEL_TENSOR.ATTN_K_NORM, bid), data_torch)
return
# 4. All other tensors β€” pass through normally
yield from super().modify_tensors(data_torch, name, bid)
```
No weights were altered. Only the tensor name metadata was remapped.
### 2. RoPE Scaling Factor Correction
During conversion, the script raised this warning:
```
The explicitly set RoPE scaling factor (config.rope_parameters['factor'] = 8.0)
does not match the ratio implicitly set by other parameters
(implicit factor = max_position_embeddings / original_max_position_embeddings = 4.0).
Using the explicit factor (8.0) in YaRN. This may cause unexpected behaviour.
```
The implicit factor (4.0) is mathematically derived from the model's own position embedding settings. The explicit `8.0` in the upstream config appears to be an authoring error. To produce a consistent and correctly-behaving GGUF, **the factor was corrected from `8.0` to `4.0`** in `config.json` before conversion.
This means the effective context window for these GGUFs reflects the 4.0Γ— YaRN scaling, not 8.0Γ—. If Flower Labs corrects the upstream config, a re-conversion would be straightforward.
## License
The original Lizzy-7B model is released under **Apache 2.0** by Flower Labs. These quants inherit that license.
## Links
- [Original Model: FlowerLabs/Lizzy-7B](https://huggingface.co/flwrlabs/Lizzy-7B)
- [Flower Labs](https://flower.ai)
- [llama.cpp](https://github.com/ggerganov/llama.cpp)
### About Me
This GGUF port was completed by **Anshuman Singh**.
* **GitHub:** [github.com/SolusOps](https://github.com/solusops)
* **LinkedIn:** [linkedin.com/in/anshumansingh2023](https://www.linkedin.com/in/anshumansingh2023/)
If this port helped your local deployment, feel free to connect!