---
language:
- en
library_name: transformers
pipeline_tag: text-generation
license: apache-2.0
tags:
- lizzy-7b
- flwrlabs
- british-english
- text-generation
model_name: Lizzy 7B
---
# Lizzy 7B
## Model Name And Summary
Lizzy 7B is an open-weight Flower Labs assistant model in the Lizzy family.
## Architecture And Configuration
Lizzy 7B is a 7B-class decoder-only transformer with long-context support, sliding/local attention behaviour, custom chat/control tokens, and deployment-specific serving configurations.
Representative configuration points:
- 7B-class parameter scale with a 32-layer stack;
- long-context configuration up to 65k tokens with runtime caps adjusted by deployment profile;
- 32 attention heads with long-context/sliding-attention behaviour;
- custom tokenizer and chat markers for instruction-style prompting;
- deployment variants may include quantised revisions, runtime patches, and serving-time configuration changes.
## Training Approach
Lizzy 7B follows a multi-stage training approach that combines:
- pre-training on large-scale public text, document, code, math, and encyclopedic corpora;
- supervised fine-tuning on instruction-following, dialogue, reasoning, and tool-use examples;
- direct preference optimisation on preference pairs for helpfulness, style, and answer quality;
- reinforcement learning with verifiable rewards for targeted behavioural refinement.
Across these stages, training data has been mixed across:
- broad public text and knowledge sources;
- synthetic instruction and preference data;
- private synthetic data used to favour British behaviour and knowledge;
- UK-specific examples and preference signals used to strengthen local knowledge and style.
## Evaluation Against European Baselines
Britishness comparisons against the European baselines present in the latest local artifact set:
| Benchmark | Lizzy 7B | EuroLLM 9B | Apertus 8B |
| --- | ---: | ---: | ---: |
| Britishness MCQ | 71.0 | 77.6 | **80.8** |
| Britishness CoT | **80.1** | 72.1 | 31.7 |
| Britishness Domains | **89.9** | 69.0 | 32.6 |
Broader benchmark comparisons against the same European baselines:
| Benchmark | Lizzy 7B | EuroLLM 9B | Apertus 8B |
| --- | ---: | ---: | ---: |
| MATH | **77.9** | 31.3 | 22.4 |
| OMEGA | **29.0** | 4.7 | 5.0 |
| BigBenchHard | **69.0** | 38.9 | 42.4 |
| AGI Eval English | **65.6** | 50.2 | 50.4 |
| MMLU | **67.9** | 57.4 | 63.4 |
| GPQA | **34.6** | 26.8 | 28.1 |
| HumanEvalPlus | **70.2** | 28.2 | 33.4 |
| MBPP+ | **52.5** | 41.7 | 42.3 |
| LiveCodeBench v3 | **39.1** | 6.3 | 8.5 |
| IFEval | 63.8 | 55.8 | **65.1** |
| AIME | **35.8** | 0.2 | 0.6 |
| GSM8K | **91.8** | 64.7 | 64.7 |
| IFBench | **22.7** | 18.0 | 15.3 |
| POPQA | 22.2 | **25.6** | 25.1 |
| ZebraLogic | **12.4** | 4.4 | 5.9 |
Summary:
- Lizzy 7B trails the European baselines on Britishness MCQ (a private Flower Labs benchmark) recall-style probing.
- Lizzy 7B leads the reported European baselines on Britishness CoT and Britishness domain reasoning (private Flower Labs benchmarks) where comparable metrics are available.
- Lizzy 7B also leads the latest local European baseline set on most knowledge, reasoning, math, and coding rows represented in the table above.
## Intended Uses And Limitations
Intended uses:
- UK-oriented assistant experiences;
- general reasoning and coding assistance;
- managed deployment through private Hugging Face or vLLM serving stacks.
## Safety And Bias Considerations
The latest safety-evaluation reports the following task-level primary scores:
| Safety benchmark | Metric | Score |
| --- | --- | ---: |
| Overall safety average | `overall_safety_average` | 66.7% |
| WildGuardTest | `inverted_micro_harm_lower` | 91.9% |
| HarmBench | `inverted_micro_asr_lower` | 57.5% |
| ToxiGen (tiny) | `safe_overall` | 90.2% |
| XSTest | `overall_accuracy` | 85.6% |
| StrongReject (logprobs) | `inverted_asr` | 78.8% |
| BBQ | `accuracy` | 66.5% |
| WMDP | `inverted_accuracy` | 47.5% |
Lizzy 7B can still produce incorrect, outdated, or over-confident responses and should be used with human oversight for higher-risk workflows. UK-specific tuning improves local style and cultural alignment but can also bias tone and assumptions toward UK conventions; downstream moderation and policy controls remain required.
## License And Citation
- Model licence: Apache-2.0
- Public and synthetic training sources include open-licensed public data plus private synthetic and UK-specific data that are not redistributed
- Citation and legal text should still be confirmed by owner review before any external publication.
## Python Example (Transformers)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
repo_id = "flwrlabs/Lizzy-7B"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are Lizzy 7B."},
{"role": "user", "content": "Summarise why queue etiquette matters in the UK."},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output_ids = model.generate(
**inputs,
temperature=0.2,
top_p=0.9,
)
response = tokenizer.decode(output_ids[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)
```
## Multi-GPU vLLM Tensor Parallel Patch
For reproducible multi-GPU vLLM support with Lizzy-family checkpoints, this deliverable bundles:
- bundled draft artifact: `vllm_patches/transformers_lizzy_tp.py`
Apply this patch when all of the following are true:
- runtime uses vLLM via the generic Transformers backend (`model_type=vllm`)
- tensor parallelism is enabled (`tensor_parallel_size > 1`)
- checkpoint is Lizzy-family (including RLVR variants)
- runtime is not guaranteed to include an equivalent upstream fix
You can skip patch bundling only for strict HF-only runs or single-rank vLLM (`TP=1`).
Why this is included:
- it mitigates known Lizzy TP failure modes in generic vLLM Transformers loading
- it fixes rank-local head partitioning and `q_norm`/`k_norm` slicing behaviour
- it prevents the known tensor-shape crash class seen without this patch