Text Generation
Transformers
Safetensors
English
lizzy
lizzy-7b
flwrlabs
british-english
conversational
custom_code
Instructions to use flwrlabs/Lizzy-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use flwrlabs/Lizzy-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="flwrlabs/Lizzy-7B", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("flwrlabs/Lizzy-7B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use flwrlabs/Lizzy-7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "flwrlabs/Lizzy-7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "flwrlabs/Lizzy-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/flwrlabs/Lizzy-7B
- SGLang
How to use flwrlabs/Lizzy-7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "flwrlabs/Lizzy-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "flwrlabs/Lizzy-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "flwrlabs/Lizzy-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "flwrlabs/Lizzy-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use flwrlabs/Lizzy-7B with Docker Model Runner:
docker model run hf.co/flwrlabs/Lizzy-7B
| language: | |
| - en | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| license: apache-2.0 | |
| tags: | |
| - lizzy-7b | |
| - flwrlabs | |
| - british-english | |
| - text-generation | |
| model_name: Lizzy 7B | |
| # Lizzy 7B | |
| <img class="dark:hidden" src="./header-light.svg" alt="Lizzy 7B header figure (light theme)" /> | |
| <img class="hidden dark:block" src="./header-dark.svg" alt="Lizzy 7B header figure (dark theme)" /> | |
| ## Model Name And Summary | |
| Lizzy 7B is an open-weight Flower Labs assistant model in the Lizzy family. | |
| ## Architecture And Configuration | |
| Lizzy 7B is a 7B-class decoder-only transformer with long-context support, sliding/local attention behaviour, custom chat/control tokens, and deployment-specific serving configurations. | |
| Representative configuration points: | |
| - 7B-class parameter scale with a 32-layer stack; | |
| - long-context configuration up to 65k tokens with runtime caps adjusted by deployment profile; | |
| - 32 attention heads with long-context/sliding-attention behaviour; | |
| - custom tokenizer and chat markers for instruction-style prompting; | |
| - deployment variants may include quantised revisions, runtime patches, and serving-time configuration changes. | |
| ## Training Approach | |
| Lizzy 7B follows a multi-stage training approach that combines: | |
| - pre-training on large-scale public text, document, code, math, and encyclopedic corpora; | |
| - supervised fine-tuning on instruction-following, dialogue, reasoning, and tool-use examples; | |
| - direct preference optimisation on preference pairs for helpfulness, style, and answer quality; | |
| - reinforcement learning with verifiable rewards for targeted behavioural refinement. | |
| Across these stages, training data has been mixed across: | |
| - broad public text and knowledge sources; | |
| - synthetic instruction and preference data; | |
| - private synthetic data used to favour British behaviour and knowledge; | |
| - UK-specific examples and preference signals used to strengthen local knowledge and style. | |
| ## Evaluation Against European Baselines | |
| Britishness comparisons against the European baselines present in the latest local artifact set: | |
| | Benchmark | Lizzy 7B | EuroLLM 9B | Apertus 8B | | |
| | --- | ---: | ---: | ---: | | |
| | Britishness MCQ | 71.0 | <u>77.6</u> | **80.8** | | |
| | Britishness CoT | **80.1** | <u>72.1</u> | 31.7 | | |
| | Britishness Domains | **89.9** | <u>69.0</u> | 32.6 | | |
| Broader benchmark comparisons against the same European baselines: | |
| | Benchmark | Lizzy 7B | EuroLLM 9B | Apertus 8B | | |
| | --- | ---: | ---: | ---: | | |
| | MATH | **77.9** | <u>31.3</u> | 22.4 | | |
| | OMEGA | **29.0** | 4.7 | <u>5.0</u> | | |
| | BigBenchHard | **69.0** | 38.9 | <u>42.4</u> | | |
| | AGI Eval English | **65.6** | 50.2 | <u>50.4</u> | | |
| | MMLU | **67.9** | 57.4 | <u>63.4</u> | | |
| | GPQA | **34.6** | 26.8 | <u>28.1</u> | | |
| | HumanEvalPlus | **70.2** | 28.2 | <u>33.4</u> | | |
| | MBPP+ | **52.5** | 41.7 | <u>42.3</u> | | |
| | LiveCodeBench v3 | **39.1** | 6.3 | <u>8.5</u> | | |
| | IFEval | <u>63.8</u> | 55.8 | **65.1** | | |
| | AIME | **35.8** | 0.2 | <u>0.6</u> | | |
| | GSM8K | **91.8** | <u>64.7</u> | 64.7 | | |
| | IFBench | **22.7** | <u>18.0</u> | 15.3 | | |
| | POPQA | 22.2 | **25.6** | <u>25.1</u> | | |
| | ZebraLogic | **12.4** | 4.4 | <u>5.9</u> | | |
| Summary: | |
| - Lizzy 7B trails the European baselines on Britishness MCQ (a private Flower Labs benchmark) recall-style probing. | |
| - Lizzy 7B leads the reported European baselines on Britishness CoT and Britishness domain reasoning (private Flower Labs benchmarks) where comparable metrics are available. | |
| - Lizzy 7B also leads the latest local European baseline set on most knowledge, reasoning, math, and coding rows represented in the table above. | |
| ## Intended Uses And Limitations | |
| Intended uses: | |
| - UK-oriented assistant experiences; | |
| - general reasoning and coding assistance; | |
| - managed deployment through private Hugging Face or vLLM serving stacks. | |
| ## Safety And Bias Considerations | |
| The latest safety-evaluation reports the following task-level primary scores: | |
| | Safety benchmark | Metric | Score | | |
| | --- | --- | ---: | | |
| | Overall safety average | `overall_safety_average` | 66.7% | | |
| | WildGuardTest | `inverted_micro_harm_lower` | 91.9% | | |
| | HarmBench | `inverted_micro_asr_lower` | 57.5% | | |
| | ToxiGen (tiny) | `safe_overall` | 90.2% | | |
| | XSTest | `overall_accuracy` | 85.6% | | |
| | StrongReject (logprobs) | `inverted_asr` | 78.8% | | |
| | BBQ | `accuracy` | 66.5% | | |
| | WMDP | `inverted_accuracy` | 47.5% | | |
| Lizzy 7B can still produce incorrect, outdated, or over-confident responses and should be used with human oversight for higher-risk workflows. UK-specific tuning improves local style and cultural alignment but can also bias tone and assumptions toward UK conventions; downstream moderation and policy controls remain required. | |
| ## License And Citation | |
| - Model licence: Apache-2.0 | |
| - Public and synthetic training sources include open-licensed public data plus private synthetic and UK-specific data that are not redistributed | |
| - Citation and legal text should still be confirmed by owner review before any external publication. | |
| ## Python Example (Transformers) | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| repo_id = "flwrlabs/Lizzy-7B" | |
| tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| repo_id, | |
| trust_remote_code=True, | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto", | |
| ) | |
| messages = [ | |
| {"role": "system", "content": "You are Lizzy 7B."}, | |
| {"role": "user", "content": "Summarise why queue etiquette matters in the UK."}, | |
| ] | |
| prompt = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=False, | |
| add_generation_prompt=True, | |
| ) | |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) | |
| output_ids = model.generate( | |
| **inputs, | |
| temperature=0.2, | |
| top_p=0.9, | |
| ) | |
| response = tokenizer.decode(output_ids[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True) | |
| print(response) | |
| ``` | |
| ## Multi-GPU vLLM Tensor Parallel Patch | |
| For reproducible multi-GPU vLLM support with Lizzy-family checkpoints, this deliverable bundles: | |
| - bundled draft artifact: `vllm_patches/transformers_lizzy_tp.py` | |
| Apply this patch when all of the following are true: | |
| - runtime uses vLLM via the generic Transformers backend (`model_type=vllm`) | |
| - tensor parallelism is enabled (`tensor_parallel_size > 1`) | |
| - checkpoint is Lizzy-family (including RLVR variants) | |
| - runtime is not guaranteed to include an equivalent upstream fix | |
| You can skip patch bundling only for strict HF-only runs or single-rank vLLM (`TP=1`). | |
| Why this is included: | |
| - it mitigates known Lizzy TP failure modes in generic vLLM Transformers loading | |
| - it fixes rank-local head partitioning and `q_norm`/`k_norm` slicing behaviour | |
| - it prevents the known tensor-shape crash class seen without this patch | |