--- license: apache-2.0 base_model: flwrlabs/Lizzy-7B tags: - llama-cpp - gguf - olmo2 - quantized - uk-english - agentic - function-calling - lizzy-7B language: - en pipeline_tag: text-generation --- # Lizzy-7B GGUF Quants > 🚨 **Update:** Flower Labs has officially released their native GGUF quants. > I highly recommend transitioning to their repository for the most stable inference and the corrected 32k context window: **[flwrlabs/Lizzy-7B-GGUF](https://huggingface.co/flwrlabs/Lizzy-7B-GGUF)**. > > *Note: During testing, I came across a bug with rope/context length issue, which has been patched in the official release. Thanks to the 250+ community members who tested this early build!* **Quantized by [SolusOps](https://huggingface.co/SolusOps)** **Original model:** [FlowerLabs/Lizzy-7B](https://huggingface.co/flwrlabs/Lizzy-7B) **Official Quants:** [flwrlabs/Lizzy-7B-GGUF](https://huggingface.co/flwrlabs/Lizzy-7B-GGUF) ## About This Repo This repository provides llama.cpp-compatible GGUF quants of **Lizzy-7B**, a UK-centric 7B language model built by [Flower Labs](https://flower.ai). Refer to the [original model card](https://huggingface.co/flwrlabs/Lizzy-7B) for more details on the model. ## Available Quants | File | Quant | Size | Use Case | |---|---|---|---| | `Lizzy-7B-f16.gguf` | F16 | ~14.6 GB | needs 20GB+ VRAM or CPU offload. | | `Lizzy-7B-Q8_0.gguf` | Q8_0 | ~7.7 GB | **Recommended** fits 12GB VRAM with excellent context headroom. | | `Lizzy-7B-Q6_K.gguf` | Q6_K | ~5.9 GB | for 10GB–12GB GPUs looking to maximize context size. | | `Lizzy-7B-Q5_K_M.gguf` | Q5_K_M | ~5.1 GB | 8GB VRAM | | `Lizzy-7B-Q4_K_M.gguf` | Q4_K_M | ~4.1 GB | 6GB–8GB GPUs. | | `Lizzy-7B-Q3_K_M.gguf` | Q3_K_M | ~3.5 GB | edge devices, 4GB GPUs, or older laptops. | ## Hardware Tested | Hardware | Quant | n_ctx | Speed | |---|---|---|---| | RTX 3060 12GB | Q8_0 | 8192 | ~23 tok/s | | RTX 3060 12GB | F16 | 4096 | Slower (VRAM overflow to RAM) | ## Conversion Notes ### 1. Architecture: OLMo 2 Post-Norm Tensor Mapping Lizzy-7B uses a Post-Norm variant of OLMo 2. The standard convert_hf_to_gguf.py script does not recognise Flower Labs tensor naming conventions (post_attn_norm, post_mlp_norm) and will fail or silently produce a broken file. The fix was to register a LizzyForCausalLM model class in the llama.cpp conversion script, subclassing Olmo2Model and overriding modify_tensors() to remap the four divergent tensor names: ``` python@ModelBase.register("LizzyForCausalLM") class LizzyModel(Olmo2Model): def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]: # 1. Lizzy: post_attn_norm -> llama.cpp: post_attention_norm if name.endswith(".post_attn_norm.weight"): yield (f"blk.{bid}.post_attention_norm.weight", data_torch) return # 2. Lizzy: post_mlp_norm -> llama.cpp: post_ffw_norm if name.endswith(".post_mlp_norm.weight"): yield (f"blk.{bid}.post_ffw_norm.weight", data_torch) return # 3. QK-Norms these mapped correctly via standard paths if name.endswith(".q_norm.weight"): yield (self.format_tensor_name(gguf.MODEL_TENSOR.ATTN_Q_NORM, bid), data_torch) return if name.endswith(".k_norm.weight"): yield (self.format_tensor_name(gguf.MODEL_TENSOR.ATTN_K_NORM, bid), data_torch) return # 4. All other tensors — pass through normally yield from super().modify_tensors(data_torch, name, bid) ``` No weights were altered. Only the tensor name metadata was remapped. ### 2. RoPE Scaling Factor Correction During conversion, the script raised this warning: ``` The explicitly set RoPE scaling factor (config.rope_parameters['factor'] = 8.0) does not match the ratio implicitly set by other parameters (implicit factor = max_position_embeddings / original_max_position_embeddings = 4.0). Using the explicit factor (8.0) in YaRN. This may cause unexpected behaviour. ``` The implicit factor (4.0) is mathematically derived from the model's own position embedding settings. The explicit `8.0` in the upstream config appears to be an authoring error. To produce a consistent and correctly-behaving GGUF, **the factor was corrected from `8.0` to `4.0`** in `config.json` before conversion. This means the effective context window for these GGUFs reflects the 4.0× YaRN scaling, not 8.0×. If Flower Labs corrects the upstream config, a re-conversion would be straightforward. ## License The original Lizzy-7B model is released under **Apache 2.0** by Flower Labs. These quants inherit that license. ## Links - [Original Model: FlowerLabs/Lizzy-7B](https://huggingface.co/flwrlabs/Lizzy-7B) - [Flower Labs](https://flower.ai) - [llama.cpp](https://github.com/ggerganov/llama.cpp) ### About Me This GGUF port was completed by **Anshuman Singh**. * **GitHub:** [github.com/SolusOps](https://github.com/solusops) * **LinkedIn:** [linkedin.com/in/anshumansingh2023](https://www.linkedin.com/in/anshumansingh2023/) If this port helped your local deployment, feel free to connect!