--- language: - en library_name: transformers pipeline_tag: text-generation license: apache-2.0 tags: - lizzy-7b - flwrlabs - british-english - text-generation model_name: Lizzy 7B --- # Lizzy 7B Lizzy 7B header figure (light theme) ## Model Name And Summary Lizzy 7B is an open-weight Flower Labs assistant model in the Lizzy family. ## Architecture And Configuration Lizzy 7B is a 7B-class decoder-only transformer with long-context support, sliding/local attention behaviour, custom chat/control tokens, and deployment-specific serving configurations. Representative configuration points: - 7B-class parameter scale with a 32-layer stack; - long-context configuration up to 65k tokens with runtime caps adjusted by deployment profile; - 32 attention heads with long-context/sliding-attention behaviour; - custom tokenizer and chat markers for instruction-style prompting; - deployment variants may include quantised revisions, runtime patches, and serving-time configuration changes. ## Training Approach Lizzy 7B follows a multi-stage training approach that combines: - pre-training on large-scale public text, document, code, math, and encyclopedic corpora; - supervised fine-tuning on instruction-following, dialogue, reasoning, and tool-use examples; - direct preference optimisation on preference pairs for helpfulness, style, and answer quality; - reinforcement learning with verifiable rewards for targeted behavioural refinement. Across these stages, training data has been mixed across: - broad public text and knowledge sources; - synthetic instruction and preference data; - private synthetic data used to favour British behaviour and knowledge; - UK-specific examples and preference signals used to strengthen local knowledge and style. ## Evaluation Against European Baselines Britishness comparisons against the European baselines present in the latest local artifact set: | Benchmark | Lizzy 7B | EuroLLM 9B | Apertus 8B | | --- | ---: | ---: | ---: | | Britishness MCQ | 71.0 | 77.6 | **80.8** | | Britishness CoT | **80.1** | 72.1 | 31.7 | | Britishness Domains | **89.9** | 69.0 | 32.6 | Broader benchmark comparisons against the same European baselines: | Benchmark | Lizzy 7B | EuroLLM 9B | Apertus 8B | | --- | ---: | ---: | ---: | | MATH | **77.9** | 31.3 | 22.4 | | OMEGA | **29.0** | 4.7 | 5.0 | | BigBenchHard | **69.0** | 38.9 | 42.4 | | AGI Eval English | **65.6** | 50.2 | 50.4 | | MMLU | **67.9** | 57.4 | 63.4 | | GPQA | **34.6** | 26.8 | 28.1 | | HumanEvalPlus | **70.2** | 28.2 | 33.4 | | MBPP+ | **52.5** | 41.7 | 42.3 | | LiveCodeBench v3 | **39.1** | 6.3 | 8.5 | | IFEval | 63.8 | 55.8 | **65.1** | | AIME | **35.8** | 0.2 | 0.6 | | GSM8K | **91.8** | 64.7 | 64.7 | | IFBench | **22.7** | 18.0 | 15.3 | | POPQA | 22.2 | **25.6** | 25.1 | | ZebraLogic | **12.4** | 4.4 | 5.9 | Summary: - Lizzy 7B trails the European baselines on Britishness MCQ (a private Flower Labs benchmark) recall-style probing. - Lizzy 7B leads the reported European baselines on Britishness CoT and Britishness domain reasoning (private Flower Labs benchmarks) where comparable metrics are available. - Lizzy 7B also leads the latest local European baseline set on most knowledge, reasoning, math, and coding rows represented in the table above. ## Intended Uses And Limitations Intended uses: - UK-oriented assistant experiences; - general reasoning and coding assistance; - managed deployment through private Hugging Face or vLLM serving stacks. ## Safety And Bias Considerations The latest safety-evaluation reports the following task-level primary scores: | Safety benchmark | Metric | Score | | --- | --- | ---: | | Overall safety average | `overall_safety_average` | 66.7% | | WildGuardTest | `inverted_micro_harm_lower` | 91.9% | | HarmBench | `inverted_micro_asr_lower` | 57.5% | | ToxiGen (tiny) | `safe_overall` | 90.2% | | XSTest | `overall_accuracy` | 85.6% | | StrongReject (logprobs) | `inverted_asr` | 78.8% | | BBQ | `accuracy` | 66.5% | | WMDP | `inverted_accuracy` | 47.5% | Lizzy 7B can still produce incorrect, outdated, or over-confident responses and should be used with human oversight for higher-risk workflows. UK-specific tuning improves local style and cultural alignment but can also bias tone and assumptions toward UK conventions; downstream moderation and policy controls remain required. ## License And Citation - Model licence: Apache-2.0 - Public and synthetic training sources include open-licensed public data plus private synthetic and UK-specific data that are not redistributed - Citation and legal text should still be confirmed by owner review before any external publication. ## Python Example (Transformers) ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch repo_id = "flwrlabs/Lizzy-7B" tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( repo_id, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto", ) messages = [ {"role": "system", "content": "You are Lizzy 7B."}, {"role": "user", "content": "Summarise why queue etiquette matters in the UK."}, ] prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) output_ids = model.generate( **inputs, temperature=0.2, top_p=0.9, ) response = tokenizer.decode(output_ids[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True) print(response) ``` ## Multi-GPU vLLM Tensor Parallel Patch For reproducible multi-GPU vLLM support with Lizzy-family checkpoints, this deliverable bundles: - bundled draft artifact: `vllm_patches/transformers_lizzy_tp.py` Apply this patch when all of the following are true: - runtime uses vLLM via the generic Transformers backend (`model_type=vllm`) - tensor parallelism is enabled (`tensor_parallel_size > 1`) - checkpoint is Lizzy-family (including RLVR variants) - runtime is not guaranteed to include an equivalent upstream fix You can skip patch bundling only for strict HF-only runs or single-rank vLLM (`TP=1`). Why this is included: - it mitigates known Lizzy TP failure modes in generic vLLM Transformers loading - it fixes rank-local head partitioning and `q_norm`/`k_norm` slicing behaviour - it prevents the known tensor-shape crash class seen without this patch