Spaces:

argyrotsipi
/

ParliaBench

Sleeping

App Files Files Community

argyrotsipi commited on Feb 26

Commit

5262c26

verified ·

1 Parent(s): 48c01d8

Upload 6 files

Browse files

Files changed (6) hide show

README.md +164 -6
app.py +534 -0
prompt_templates.py +137 -0
requirements.txt +8 -0
sample_data.json +152 -0
utils.py +292 -0

README.md CHANGED Viewed

@@ -1,11 +1,169 @@
 ---
-title: ParliaBench
-emoji: 📊
-colorFrom: pink
 colorTo: indigo
-sdk: docker
-pinned: false
 license: mit
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: ParliaBench Demo
+emoji: 🏛️
+colorFrom: blue
 colorTo: indigo
+sdk: gradio
+sdk_version: 4.36.0
+app_file: app.py
+pinned: true
 license: mit
+tags:
+  - nlp
+  - text-generation
+  - political-speech
+  - parliamentary
+  - fine-tuning
+  - lora
+  - qlora
+  - benchmark
+  - peft
 ---
+# 🏛️ ParliaBench — UK Parliamentary Speech Generation
+Interactive inference demo for **ParliaBench**, a benchmark framework for evaluating
+LLM-generated UK parliamentary speeches.
+**Paper:** *ParliaBench: An Evaluation and Benchmarking Framework for LLM-Generated Parliamentary Speech*
+**Authors:** Marios Koniaris, Argyro Tsipi, Panayiotis Tsanakas · NTUA
+**arXiv:** [2511.08247](https://arxiv.org/abs/2511.08247)
+---
+## What This Space Does
+1. **Generate synthetic parliamentary speeches** conditioned on party, EuroVoc topic,
+   parliamentary section, house, and political orientation
+2. **Inspect the exact prompt** sent to each model (chat template tokens included)
+3. **Browse curated samples** — synthetic vs real ParlaMint-GB speeches side by side
+4. **Tune generation parameters** (temperature, top-p, repetition penalty)
+---
+## Models
+Five LLMs fine-tuned on ParlaMint-GB via **QLoRA** (Unsloth framework):
+| Display name | Base model (Unsloth 4-bit) | Fine-tuned repo |
+|---|---|---|
+| Mistral-7B | `unsloth/mistral-7b-v0.3-bnb-4bit` | [`argyrotsipi/parliabench-unsloth-mistral-7b-v0.3`](https://huggingface.co/argyrotsipi/parliabench-unsloth-mistral-7b-v0.3) |
+| Llama-3.1-8B | `unsloth/Meta-Llama-3.1-8B-bnb-4bit` | [`argyrotsipi/parliabench-unsloth-llama-3.1-8b`](https://huggingface.co/argyrotsipi/parliabench-unsloth-llama-3.1-8b) |
+| Gemma-2-9B | `unsloth/gemma-2-9b-bnb-4bit` | [`argyrotsipi/parliabench-unsloth-gemma-2-9b`](https://huggingface.co/argyrotsipi/parliabench-unsloth-gemma-2-9b) |
+| Qwen2-7B | `unsloth/Qwen2-7B-bnb-4bit` | [`argyrotsipi/parliabench-unsloth-qwen-2-7b`](https://huggingface.co/argyrotsipi/parliabench-unsloth-qwen-2-7b) |
+| Yi-1.5-**6B** | `unsloth/Yi-1.5-6B-bnb-4bit` | [`argyrotsipi/parliabench-unsloth-yi-1.5-6b`](https://huggingface.co/argyrotsipi/parliabench-unsloth-yi-1.5-6b) |
+Baseline (non-fine-tuned) versions are also selectable for direct comparison.
+---
+## Datasets
+| Repo | Contents |
+|---|---|
+| [`argyrotsipi/train-dataset`](https://huggingface.co/datasets/argyrotsipi/train-dataset) | ParlaMint-GB training split (preprocessed) |
+| [`argyrotsipi/generated-dataset`](https://huggingface.co/datasets/argyrotsipi/generated-dataset) | 27 560 generated speeches + evaluation results |
+---
+## LoRA Training Configuration
+| Parameter | Value |
+|---|---|
+| LoRA rank (r) | 16 |
+| LoRA alpha | 16 |
+| Target modules | q, k, v, o, gate, up, down projections |
+| Dropout | 0 |
+| Batch size | 64 |
+| Learning rate | 2e-4 |
+| Optimizer | AdamW fused |
+| Max steps | 11 194 (~2 epochs) |
+| Warmup steps | 336 |
+| Max seq length | 1 024 |
+| Framework | Unsloth + SFTTrainer (TRL) |
+---
+## Prompt Structure
+**System prompt** (generation):
+```
+You are a seasoned UK parliamentary member. Generate a coherent speech of
+{min_words}-{max_words} words in standard English (no Unicode artifacts, no special characters).
+Use proper British parliamentary language appropriate for the specified House.
+The speech should reflect the political orientation and typical positions of the
+specified party on the given topic.
+```
+**Context string** (pipe-separated, matches generation code exactly):
+```
+EUROVOC TOPIC: {topic} | SECTION: {section} | PARTY: {party} | POLITICAL ORIENTATION: {orientation} | HOUSE: {house}
+```
+Each model wraps these in its own chat template (Mistral `[INST]`, Llama header tokens,
+Gemma `<start_of_turn>`, Qwen/Yi ChatML).
+---
+## Generation Parameters (thesis defaults)
+| Parameter | Value | Notes |
+|---|---|---|
+| Temperature | 0.7 | Balances coherence and diversity |
+| Top-p | 0.85 | Nucleus sampling |
+| Repetition penalty | 1.2 | Penalises redundant phrasing |
+| Max new tokens | 850 | 1.33 × P90 speech length |
+| Min words (P10) | 43 | Lower quality threshold |
+| Max words (P90) | 635 | Upper quality threshold |
+| Batch size | 32 | Used in full generation runs |
+---
+## Evaluation Framework
+27 560 speeches evaluated across three dimensions:
+### Linguistic Quality
+Perplexity · Self-BLEU · Distinct-n · GRUEN Score · BERTScore · MoverScore
+### Semantic Coherence
+LLM-as-a-Judge (coherence, conciseness, relevance) via FlowJudge-v0.1 (3.8B)
+### Political Authenticity ← novel metrics
+- **Political Spectrum Alignment (PSA)** — embedding cosine similarity to spectrum axis
+- **Party Alignment** — cosine similarity to real party speech embeddings
+- LLM-as-a-Judge (authenticity, political appropriateness, overall quality)
+Statistical analysis: paired t-tests, independent t-tests, one-way ANOVA, Bonferroni correction.
+---
+## Space File Structure
+```
+argyrotsipi/ParliaBench/
+├── app.py               # Gradio UI + inference pipeline
+├── utils.py             # Party data, topic lists, validator (from SpeechValidator)
+├── prompt_templates.py  # Model-specific chat templates (exact from speech_generator.py)
+├── sample_data.json     # 7 curated demo speeches
+├── requirements.txt
+└── README.md
+```
+Training code → GitHub
+Datasets → [`argyrotsipi/train-dataset`](https://huggingface.co/datasets/argyrotsipi/train-dataset) · [`argyrotsipi/generated-dataset`](https://huggingface.co/datasets/argyrotsipi/generated-dataset)
+---
+## Citation
+```bibtex
+@article{koniaris2025parliabench,
+  title   = {ParliaBench: An Evaluation and Benchmarking Framework for
+             LLM-Generated Parliamentary Speech},
+  author  = {Koniaris, Marios and Tsipi, Argyro and Tsanakas, Panayiotis},
+  journal = {arXiv preprint arXiv:2511.08247},
+  year    = {2025},
+  url     = {https://arxiv.org/abs/2511.08247}
+}
+```

app.py ADDED Viewed

	@@ -0,0 +1,534 @@

+"""
+ParliaBench Demo — Hugging Face Space
+Interactive inference demo for LLM-generated UK parliamentary speeches.
+Based on:
+  "ParliaBench: An Evaluation and Benchmarking Framework for
+   LLM-Generated Parliamentary Speech"
+  Argyro Tsipi, NTUA Diploma Thesis, October 2025
+Repos:
+  Models  → argyro/parliabench-{model}-lora
+  Dataset → argyro/parliabench-gb-processed
+  Space   → argyro/parliabench-demo
+"""
+import json
+import re
+import time
+import gradio as gr
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from utils import (
+    PARTIES, EUROVOC_TOPICS, HOUSES, MODELS, MODEL_FAMILY, MODEL_CONFIG,
+    DEFAULT_GEN_PARAMS, get_valid_houses, get_orientation,
+    build_context_string, count_tokens_approx, validate_speech,
+)
+from prompt_templates import build_full_prompt
+# ─── Model cache ──────────────────────────────────────────────────────────────
+_model_cache: dict = {}
+def _load_model_and_tokenizer(model_display_name: str):
+    """Load (and cache) model + tokenizer for the given display name."""
+    if model_display_name in _model_cache:
+        return _model_cache[model_display_name]
+    repo_id   = MODELS[model_display_name]
+    family    = MODEL_FAMILY[model_display_name]
+    is_ft     = "fine-tuned" in model_display_name
+    base_repo = MODEL_CONFIG[family]["base_model"]
+    tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
+    if tokenizer.pad_token is None:
+        tokenizer.pad_token = tokenizer.eos_token
+    device_map = "auto" if torch.cuda.is_available() else None
+    dtype      = torch.float16 if torch.cuda.is_available() else torch.float32
+    if is_ft:
+        # Load base model, then apply LoRA adapter
+        from peft import PeftModel
+        base = AutoModelForCausalLM.from_pretrained(
+            base_repo, torch_dtype=dtype, device_map=device_map,
+            trust_remote_code=True,
+        )
+        model = PeftModel.from_pretrained(base, repo_id)
+    else:
+        model = AutoModelForCausalLM.from_pretrained(
+            repo_id, torch_dtype=dtype, device_map=device_map,
+            trust_remote_code=True,
+        )
+    model.eval()
+    _model_cache[model_display_name] = (model, tokenizer)
+    return model, tokenizer
+# ─── Speech extraction (mirrors extract_speech in speech_generator.py) ────────
+def _extract_speech(raw_text: str, family: str) -> str:
+    """Extract clean speech from raw decoded model output."""
+    cfg = MODEL_CONFIG[family]
+    # Find start marker
+    start = cfg["start_marker"]
+    if start in raw_text:
+        parts = raw_text.split(start)
+        speech = parts[-1].lstrip("\n")
+    else:
+        speech = raw_text
+    # Truncate at end marker
+    for em in cfg["end_markers"]:
+        if em in speech:
+            speech = speech.split(em)[0]
+            break
+    # Remove special tokens
+    for tok in cfg["special_tokens_to_remove"]:
+        speech = speech.replace(tok, "")
+    # Remove template artefacts
+    for art in ["Context:", "Instruction:", "EUROVOC TOPIC:", "SECTION:",
+                "PARTY:", "POLITICAL ORIENTATION:", "HOUSE:",
+                "\nuser", "\nassistant", "\nsystem"]:
+        if art in speech:
+            speech = speech.split(art)[0]
+    # Strip meta-commentary prefixes
+    _strip_prefixes = [
+        "Thank you for providing", "Thank you for your instruction",
+        "Here is my speech:", "Here is my response:", "Response:",
+        "Based on your specifications", "Based on the context provided",
+    ]
+    sl = speech.lower()
+    for prefix in _strip_prefixes:
+        if sl.startswith(prefix.lower()):
+            if prefix.endswith(":"):
+                speech = speech[len(prefix):].lstrip()
+            else:
+                cut = speech.find("\n\n")
+                if 0 < cut < 200:
+                    speech = speech[cut + 2:].strip()
+                else:
+                    cut = speech.find("\n")
+                    if 0 < cut < 150:
+                        speech = speech[cut + 1:].strip()
+            break
+    # Llama reserved tokens
+    speech = re.sub(r"<\|reserved_special_token_\d+\|>", "", speech)
+    speech = re.sub(r"<\|[^|]*\|>", "", speech)
+    # Whitespace
+    speech = re.sub(r"\n{3,}", "\n\n", speech)
+    speech = re.sub(r" {2,}", " ", speech)
+    speech = speech.strip()
+    # Leading punctuation artefacts
+    speech = re.sub(r"^[^\w\s\"'(]+", "", speech).lstrip()
+    speech = re.sub(r"^\.{2,}\s*", "", speech)
+    # HTML tags / trailing dashes
+    speech = re.sub(r"</?[a-zA-Z][^>]*>", "", speech)
+    speech = re.sub(r"----+\s*\.?\s*$", "", speech)
+    # Qwen: literal escape sequences
+    if "\\n" in speech or "\\t" in speech:
+        speech = speech.replace("\\n", "\n").replace("\\t", " ")
+    # Markdown
+    speech = re.sub(r"^#+\s+", "", speech)
+    speech = re.sub(r"\n#+\s+", "\n", speech)
+    speech = re.sub(r"\n?```\.?", "", speech)
+    speech = speech.strip()
+    # Final punctuation
+    if speech and not speech.endswith((".", "!", "?", '"', "'")):
+        speech = speech.rstrip() + "."
+    return speech
+# ─── Main generation function ─────────────────────────────────────────────────
+def generate_speech(
+    model_display_name: str,
+    party: str,
+    topic: str,
+    section: str,
+    house: str,
+    instruction_input: str,
+    temperature: float,
+    top_p: float,
+    repetition_penalty: float,
+    max_new_tokens: int,
+    min_words: int,
+    max_words: int,
+):
+    """Generate a parliamentary speech and return (speech, prompt, stats, params)."""
+    family = MODEL_FAMILY[model_display_name]
+    cfg    = MODEL_CONFIG[family]
+    instruction = (instruction_input.strip()
+                   if instruction_input and instruction_input.strip()
+                   else f"Address the debate on {section} on {topic}.")
+    full_prompt = build_full_prompt(
+        model_family=family,
+        party=party,
+        topic=topic,
+        section=section,
+        house=house,
+        instruction=instruction,
+        min_words=int(min_words),
+        max_words=int(max_words),
+    )
+    prompt_tokens = count_tokens_approx(full_prompt)
+    try:
+        model, tokenizer = _load_model_and_tokenizer(model_display_name)
+    except Exception as exc:
+        return (
+            f"⚠️ Model loading failed:\n{exc}\n\n"
+            "Make sure the model repository exists on Hugging Face "
+            "and you have sufficient GPU memory (≥16 GB recommended).",
+            full_prompt,
+            "*Model loading error — see output above.*",
+            "",
+        )
+    inputs   = tokenizer([full_prompt], return_tensors="pt").to(model.device)
+    in_len   = inputs["input_ids"].shape[-1]
+    pad_id   = tokenizer.pad_token_id or tokenizer.eos_token_id
+    t0 = time.time()
+    with torch.no_grad():
+        out_ids = model.generate(
+            **inputs,
+            max_new_tokens=int(max_new_tokens),
+            do_sample=True,
+            temperature=float(temperature),
+            top_p=float(top_p),
+            repetition_penalty=float(repetition_penalty),
+            pad_token_id=pad_id,
+            eos_token_id=tokenizer.eos_token_id,
+            stop_strings=cfg["stop_strings"],
+            tokenizer=tokenizer,
+            use_cache=True,
+        )
+    elapsed = time.time() - t0
+    raw = tokenizer.decode(out_ids[0], skip_special_tokens=False)
+    speech = _extract_speech(raw, family)
+    is_valid, reason = validate_speech(speech, int(min_words), int(max_words))
+    wc = len(speech.split())
+    stats = (
+        f"**Tokens in prompt:** ~{prompt_tokens}  |  "
+        f"**Words generated:** {wc}  |  "
+        f"**Time:** {elapsed:.1f}s  |  "
+        f"**Validation:** {'✅ ' + reason if is_valid else '⚠️ ' + reason}"
+    )
+    params_used = (
+        f"temperature={temperature}, top_p={top_p}, "
+        f"repetition_penalty={repetition_penalty}, max_new_tokens={max_new_tokens}"
+    )
+    return speech, full_prompt, stats, params_used
+# ─── Sample gallery ───────────────────────────────────────────────────────────
+with open("sample_data.json") as _f:
+    SAMPLES = json.load(_f)
+def _render_sample(s: dict) -> str:
+    if s.get("is_real"):
+        tag = "🏛️ Real speech (ParlaMint-GB)"
+    elif s.get("is_finetuned"):
+        tag = "✅ Synthetic — Fine-tuned"
+    else:
+        tag = "⬜ Synthetic — Baseline"
+    table_ref = f"  *(Thesis {s['table']})*" if s.get("table") else ""
+    return (
+        f"### {tag}{table_ref}\n\n"
+        f"| | |\n|---|---|\n"
+        f"| **Party** | {s['party']} |\n"
+        f"| **Topic** | {s['topic']} |\n"
+        f"| **Section** | {s['section']} |\n"
+        f"| **House** | {s['house']} |\n"
+        f"| **Orientation** | {s['orientation']} |\n"
+        f"| **Model** | {s['model']} |\n"
+        f"| **Words** | {s['word_count']} |\n\n"
+        f"---\n\n{s['speech']}"
+    )
+# ─── Dynamic UI helpers ───────────────────────────────────────────────────────
+def _update_house(party):
+    valid = get_valid_houses(party)
+    return gr.update(choices=valid, value=valid[0])
+def _update_orientation(party):
+    return gr.update(value=get_orientation(party))
+# ─── Gradio app ───────────────────────────────────────────────────────────────
+CSS = """
+#title  { text-align: center; margin-bottom: .4em; }
+#sub    { text-align: center; color: #666; margin-bottom: 1.4em; font-size: .9em; }
+#speech textarea { font-size: .95em; line-height: 1.65; }
+#prompt textarea { font-family: monospace; font-size: .78em; }
+"""
+with gr.Blocks(css=CSS, title="ParliaBench Demo") as demo:
+    gr.Markdown("# 🏛️ ParliaBench — UK Parliamentary Speech Generation",
+                elem_id="title")
+    gr.Markdown(
+        "Inference demo for five LLMs fine-tuned on **ParlaMint-GB** with QLoRA  \n"
+        "Koniaris, Tsipi & Tsanakas · [arXiv:2511.08247](https://arxiv.org/abs/2511.08247) · NTUA 2025",
+        elem_id="sub",
+    )
+    with gr.Tabs():
+        # ── Tab 1: Generate ───────────────────────────────────────────────────
+        with gr.Tab("🎙️ Generate Speech"):
+            with gr.Row():
+                # Left panel — controls
+                with gr.Column(scale=1):
+                    gr.Markdown("### 🔧 Configuration")
+                    model_select = gr.Dropdown(
+                        choices=list(MODELS.keys()),
+                        value="Llama-3.1-8B (fine-tuned)",
+                        label="Model",
+                        info="Fine-tuned = QLoRA adapter on Unsloth base; "
+                             "Baseline = raw 4-bit base model",
+                    )
+                    with gr.Group():
+                        party_select = gr.Dropdown(
+                            choices=PARTIES,
+                            value="Conservative",
+                            label="Party",
+                        )
+                        orientation_box = gr.Textbox(
+                            value=get_orientation("Conservative"),
+                            label="Political Orientation (auto-filled)",
+                            interactive=False,
+                        )
+                        house_select = gr.Dropdown(
+                            choices=HOUSES,
+                            value="House of Commons",
+                            label="House",
+                            info="Some parties are restricted to the Lords",
+                        )
+                    topic_select = gr.Dropdown(
+                        choices=EUROVOC_TOPICS,
+                        value="POLITICS",
+                        label="EuroVoc Topic",
+                        info="21 domains from the EUROVOC thesaurus",
+                    )
+                    section_input = gr.Textbox(
+                        value="National Health Service Funding",
+                        label="Debate Section / Bill Title",
+                        placeholder="e.g. Climate Change Act, Defence Procurement…",
+                    )
+                    instruction_input = gr.Textbox(
+                        label="Custom Instruction (optional)",
+                        placeholder=(
+                            "Leave blank for generic instruction, or enter a "
+                            "specific question/prompt from the debate…"
+                        ),
+                        lines=2,
+                    )
+                    gr.Markdown("### ⚙️ Generation Parameters")
+                    temperature   = gr.Slider(0.1, 1.5,
+                                              value=DEFAULT_GEN_PARAMS["temperature"],
+                                              step=0.05, label="Temperature")
+                    top_p         = gr.Slider(0.5, 1.0,
+                                              value=DEFAULT_GEN_PARAMS["top_p"],
+                                              step=0.05, label="Top-p (nucleus sampling)")
+                    rep_penalty   = gr.Slider(1.0, 2.0,
+                                              value=DEFAULT_GEN_PARAMS["repetition_penalty"],
+                                              step=0.05, label="Repetition Penalty")
+                    max_new_toks  = gr.Slider(100, 850,
+                                              value=500, step=50,
+                                              label="Max New Tokens")
+                    with gr.Row():
+                        min_words = gr.Number(value=DEFAULT_GEN_PARAMS["min_words"],
+                                              label="Min Words", precision=0)
+                        max_words = gr.Number(value=300,
+                                              label="Max Words (demo cap)", precision=0)
+                    gen_btn = gr.Button("🎤 Generate Speech",
+                                        variant="primary", size="lg")
+                # Right panel — output
+                with gr.Column(scale=2):
+                    gr.Markdown("### 📜 Generated Speech")
+                    speech_out = gr.Textbox(
+                        label="Output",
+                        lines=18,
+                        show_copy_button=True,
+                        elem_id="speech",
+                    )
+                    stats_out  = gr.Markdown("*Stats will appear here after generation.*")
+                    params_out = gr.Textbox(label="Parameters Used",
+                                            interactive=False)
+                    with gr.Accordion("🔍 Full Prompt Sent to Model", open=False):
+                        prompt_out = gr.Textbox(
+                            label="Prompt (read-only)",
+                            lines=14,
+                            interactive=False,
+                            elem_id="prompt",
+                        )
+                    gr.Markdown(
+                        "---\n"
+                        "💡 The prompt panel shows the **exact input** fed to the model "
+                        "(including chat template tokens) — useful for reproducibility."
+                    )
+            # Wire up dynamic helpers
+            party_select.change(_update_house,       party_select, house_select)
+            party_select.change(_update_orientation, party_select, orientation_box)
+            gen_btn.click(
+                fn=generate_speech,
+                inputs=[model_select, party_select, topic_select, section_input,
+                        house_select, instruction_input,
+                        temperature, top_p, rep_penalty, max_new_toks,
+                        min_words, max_words],
+                outputs=[speech_out, prompt_out, stats_out, params_out],
+            )
+        # ── Tab 2: Sample Gallery ─────────────────────────────────────────────
+        with gr.Tab("📚 Sample Gallery"):
+            gr.Markdown(
+                "### Example speeches from the thesis (Tables 7.4–7.13)\n"
+                "All 10 examples used in the thesis: 5 baseline outputs and 5 fine-tuned outputs, "
+                "one per model. Compare quality directly between baseline ⬜ and fine-tuned ✅ versions."
+            )
+            sample_choices = [
+                f"{s['table']} — {'✅ FT' if s.get('is_finetuned') else '⬜ Base'} | {s['model']} | {s['party']} | {s['topic']}"
+                for s in SAMPLES
+            ]
+            sample_sel = gr.Dropdown(
+                choices=sample_choices,
+                value=sample_choices[0],
+                label="Select a sample",
+            )
+            sample_md = gr.Markdown(_render_sample(SAMPLES[0]))
+            def _show_sample(choice: str) -> str:
+                # choice format: "Table 7.X — ..."
+                table_ref = choice.split(" — ")[0].strip()  # e.g. "Table 7.4"
+                for s in SAMPLES:
+                    if s.get("table") == table_ref:
+                        return _render_sample(s)
+                return "Sample not found."
+            sample_sel.change(fn=_show_sample,
+                              inputs=sample_sel,
+                              outputs=sample_md)
+        # ── Tab 3: About ──────────────────────────────────────────────────────
+        with gr.Tab("ℹ️ About"):
+            gr.Markdown("""
+## About ParliaBench
+**ParliaBench** is a benchmark and evaluation framework for LLM-generated
+UK parliamentary speeches, developed as a Diploma Thesis at NTUA.
+### Five fine-tuned models (QLoRA via Unsloth)
+| Model | Base | HF Repo |
+|-------|------|---------|
+| Mistral-7B | `unsloth/mistral-7b-v0.3-bnb-4bit` | [`argyrotsipi/parliabench-unsloth-mistral-7b-v0.3`](https://huggingface.co/argyrotsipi/parliabench-unsloth-mistral-7b-v0.3) |
+| Llama-3.1-8B | `unsloth/Meta-Llama-3.1-8B-bnb-4bit` | [`argyrotsipi/parliabench-unsloth-llama-3.1-8b`](https://huggingface.co/argyrotsipi/parliabench-unsloth-llama-3.1-8b) |
+| Gemma-2-9B | `unsloth/gemma-2-9b-bnb-4bit` | [`argyrotsipi/parliabench-unsloth-gemma-2-9b`](https://huggingface.co/argyrotsipi/parliabench-unsloth-gemma-2-9b) |
+| Qwen2-7B | `unsloth/Qwen2-7B-bnb-4bit` | [`argyrotsipi/parliabench-unsloth-qwen-2-7b`](https://huggingface.co/argyrotsipi/parliabench-unsloth-qwen-2-7b) |
+| Yi-1.5-**6B** | `unsloth/Yi-1.5-6B-bnb-4bit` | [`argyrotsipi/parliabench-unsloth-yi-1.5-6b`](https://huggingface.co/argyrotsipi/parliabench-unsloth-yi-1.5-6b) |
+### LoRA training configuration
+| Parameter | Value |
+|-----------|-------|
+| LoRA rank (r) | 16 |
+| LoRA alpha | 16 |
+| Target modules | q, k, v, o, gate, up, down projections |
+| Dropout | 0 |
+| Batch size | 64 |
+| Learning rate | 2e-4 |
+| Optimizer | AdamW (fused) |
+| Max steps | 11 194 (~2 epochs) |
+| Warmup steps | 336 |
+| Max seq length | 1 024 |
+### Context format used at generation time
+```
+EUROVOC TOPIC: {topic} | SECTION: {section} | PARTY: {party} | POLITICAL ORIENTATION: {orientation} | HOUSE: {house}
+```
+### Evaluation framework (3 dimensions, 27 560 speeches)
+**Linguistic quality:** Perplexity · Self-BLEU · Distinct-n · GRUEN · BERTScore · MoverScore
+**Semantic coherence:** LLM-as-a-Judge (coherence, conciseness, relevance)
+**Political authenticity:** *Political Spectrum Alignment (PSA)* · *Party Alignment* ← novel metrics
+                            LLM-as-a-Judge (authenticity, political appropriateness, overall quality)
+### Generation parameters (thesis defaults)
+| Parameter | Value |
+|-----------|-------|
+| Temperature | 0.7 |
+| Top-p | 0.85 |
+| Repetition penalty | 1.2 |
+| Max new tokens | 850 |
+| Min words (P10) | 43 |
+| Max words (P90) | 635 |
+### Citation
+```bibtex
+@article{koniaris2025parliabench,
+  title   = {ParliaBench: An Evaluation and Benchmarking Framework for LLM-Generated Parliamentary Speech},
+  author  = {Koniaris, Marios and Tsipi, Argyro and Tsanakas, Panayiotis},
+  journal = {arXiv preprint arXiv:2511.08247},
+  year    = {2025},
+  url     = {https://arxiv.org/abs/2511.08247}
+}
+```
+*[arXiv:2511.08247](https://arxiv.org/abs/2511.08247) · National Technical University of Athens · School of Electrical and Computer Engineering*
+""")
+    gr.Markdown(
+        "---\n"
+        "<small>ParliaBench Demo · NTUA 2025 · "
+        "[argyrotsipi on HF](https://huggingface.co/argyrotsipi) · "
+        "[Train dataset](https://huggingface.co/datasets/argyrotsipi/train-dataset) · "
+        "[Generated dataset](https://huggingface.co/datasets/argyrotsipi/generated-dataset)</small>"
+    )
+if __name__ == "__main__":
+    demo.launch(share=False)

prompt_templates.py ADDED Viewed

	@@ -0,0 +1,137 @@

+"""
+ParliaBench Prompt Templates
+Model-specific chat templates for parliamentary speech generation.
+Exact templates from speech_generator.py (format_prompt_* functions).
+System prompt from Config.SYSTEM_PROMPT in speech_generator.py.
+"""
+from utils import build_context_string, get_orientation
+# System prompt used at generation time (from Config.SYSTEM_PROMPT)
+# The {min_words}-{max_words} range is inserted dynamically.
+SYSTEM_PROMPT_TEMPLATE = (
+    "You are a seasoned UK parliamentary member. Generate a coherent speech of "
+    "{min_words}-{max_words} words in standard English (no Unicode artifacts, no special characters).\n"
+    "Use proper British parliamentary language appropriate for the specified House. "
+    "The speech should reflect the political orientation and typical positions of the "
+    "specified party on the given topic."
+)
+# System prompt used during training (no word-count constraint — from trainer.py)
+TRAINING_SYSTEM_PROMPT = (
+    "You are a seasoned UK parliamentary member.\n"
+    "Use proper British parliamentary language appropriate for the specified House. "
+    "The speech should reflect the political orientation and typical positions of the "
+    "specified party on the given topic."
+)
+def get_system_prompt(min_words: int, max_words: int) -> str:
+    return SYSTEM_PROMPT_TEMPLATE.format(min_words=min_words, max_words=max_words)
+# ─── Per-model format functions (exact from speech_generator.py) ──────────────
+def format_prompt_mistral(context: str, instruction: str,
+                           min_words: int, max_words: int) -> str:
+    system = get_system_prompt(min_words, max_words)
+    return (
+        f"<s>[INST] {system}\n\n"
+        f"Context: {context}\n"
+        f"Instruction: {instruction} [/INST] "
+    )
+def format_prompt_llama(context: str, instruction: str,
+                         min_words: int, max_words: int) -> str:
+    system = get_system_prompt(min_words, max_words)
+    return (
+        f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n"
+        f"{system}<|eot_id|><|start_header_id|>user<|end_header_id|>\n"
+        f"Context: {context}\n"
+        f"Instruction: {instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n"
+    )
+def format_prompt_gemma(context: str, instruction: str,
+                         min_words: int, max_words: int) -> str:
+    system = get_system_prompt(min_words, max_words)
+    return (
+        f"<bos><start_of_turn>user\n"
+        f"{system}\n"
+        f"Context: {context}\n"
+        f"Instruction: {instruction}<end_of_turn>\n"
+        f"<start_of_turn>model\n"
+    )
+def format_prompt_qwen(context: str, instruction: str,
+                        min_words: int, max_words: int) -> str:
+    system = get_system_prompt(min_words, max_words)
+    return (
+        f"<|im_start|>system\n{system}<|im_end|>\n"
+        f"<|im_start|>user\n"
+        f"Context: {context}\n"
+        f"Instruction: {instruction}<|im_end|>\n"
+        f"<|im_start|>assistant\n"
+    )
+def format_prompt_yi(context: str, instruction: str,
+                      min_words: int, max_words: int) -> str:
+    system = get_system_prompt(min_words, max_words)
+    return (
+        f"<|im_start|>system\n{system}<|im_end|>\n"
+        f"<|im_start|>user\n"
+        f"Context: {context}\n"
+        f"Instruction: {instruction}<|im_end|>\n"
+        f"<|im_start|>assistant\n"
+    )
+# Dispatch table (from FORMAT_FUNCTIONS in speech_generator.py)
+FORMAT_FUNCTIONS = {
+    "mistral": format_prompt_mistral,
+    "llama":   format_prompt_llama,
+    "gemma":   format_prompt_gemma,
+    "qwen":    format_prompt_qwen,
+    "yi":      format_prompt_yi,
+}
+def build_full_prompt(model_family: str,
+                      party: str,
+                      topic: str,
+                      section: str,
+                      house: str,
+                      instruction: str,
+                      min_words: int,
+                      max_words: int) -> str:
+    """
+    Build the complete prompt string ready to be tokenized.
+    Args:
+        model_family: One of 'mistral', 'llama', 'gemma', 'qwen', 'yi'
+        party: Political party name
+        topic: EuroVoc topic (upper-case)
+        section: Parliamentary debate section / bill title
+        house: 'House of Commons' or 'House of Lords'
+        instruction: Task instruction or generic prompt
+        min_words / max_words: Word-count constraints for the system prompt
+    Returns:
+        Formatted prompt string
+    """
+    orientation = get_orientation(party)
+    context = build_context_string(party, topic, section, orientation, house)
+    if not instruction or not instruction.strip():
+        instruction = f"Address the debate on {section} on {topic}."
+    fmt_fn = FORMAT_FUNCTIONS.get(model_family)
+    if fmt_fn is None:
+        raise ValueError(f"Unknown model family: {model_family!r}. "
+                         f"Choose from: {list(FORMAT_FUNCTIONS)}")
+    return fmt_fn(context, instruction, min_words, max_words)

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+gradio>=4.36.0
+transformers>=4.43.0
+torch>=2.3.0
+accelerate>=0.31.0
+bitsandbytes>=0.43.0
+peft>=0.11.0
+sentencepiece>=0.2.0
+protobuf>=4.25.0

sample_data.json ADDED Viewed

	@@ -0,0 +1,152 @@

+[
+  {
+    "id": "table_7_4",
+    "table": "Table 7.4",
+    "model": "Gemma-2-9B (baseline)",
+    "party": "Scottish National Party",
+    "topic": "ENERGY",
+    "section": "Domestic Renewable Energy",
+    "house": "House of Commons",
+    "orientation": "Centre-left",
+    "instruction": "Address the debate on Domestic Renewable Energy on ENERGY.",
+    "speech": "Mr Speaker, I rise today to speak about domestic renewable energy as it is vital that we ensure our country's future remains secure by reducing reliance on foreign imports while also protecting local communities from harmful emissions with new clean technologies like solar panels or wind turbines which can be installed across all parts of Scotland!",
+    "word_count": 56,
+    "is_real": false,
+    "is_finetuned": false
+  },
+  {
+    "id": "table_7_5",
+    "table": "Table 7.5",
+    "model": "Llama-3.1-8B (baseline)",
+    "party": "Conservative",
+    "topic": "INDUSTRY",
+    "section": "Hawk Aircraft: Overseas Promotion",
+    "house": "House of Commons",
+    "orientation": "Centre-right",
+    "instruction": "Address the debate on Hawk Aircraft: Overseas Promotion on INDUSTRY.",
+    "speech": "The government's recent decision to promote exports of our world-renowned aircraft manufacturing capabilities is both commendable and timely. As an advocate for economic growth and international trade, I fully support this initiative. Overseas promotion plays a crucial role in boosting Britain's economy by increasing export revenue. It also helps create jobs domestically while enhancing our nation's reputation as a global leader in innovation and engineering excellence. Hawk Aircraft has long been synonymous with quality craftsmanship, safety standards, and technological advancements that set it apart from its competitors worldwide. By highlighting these strengths during overseas promotional campaigns, we can secure lucrative contracts abroad, ensuring sustained employment opportunities here at home. The current market trends indicate growing demand for advanced fighter jets across various regions globally. With our exceptional track record, skilled workforce, and robust infrastructure supporting research & development initiatives – now more than ever before - there exists ample scope for us to capitalise upon emerging prospects beyond Europe too! In conclusion then; let me urge my esteemed colleagues within parliament today to rally behind efforts aimed towards fostering increased foreign sales of Hawks through strategic outreach programmes designed specifically tailored according to individual markets' needs thereby contributing significantly toward bolstering overall industrial output alongside creating much needed job openings amidst ongoing challenges faced due COVID19 pandemic.",
+    "word_count": 185,
+    "is_real": false,
+    "is_finetuned": false
+  },
+  {
+    "id": "table_7_6",
+    "table": "Table 7.6",
+    "model": "Mistral-7B (baseline)",
+    "party": "Conservative",
+    "topic": "ENVIRONMENT",
+    "section": "The Situation in the Gulf",
+    "house": "House of Commons",
+    "orientation": "Centre-right",
+    "instruction": "Address the debate on The Situation in the Gulf on ENVIRONMENT.",
+    "speech": "The Conservatives have long supported the EU's position as an independent global player, and we believe that this is particularly important in the context of environmental policy. We recognize the importance of working with our European partners to address issues such as climate change and biodiversity loss, but also acknowledge the need to work closely with countries outside Europe to achieve these goals. This is especially true when it comes to addressing the challenges posed by pollution, deforestation, and other threats to our planet. As such, I would like to take this opportunity to reiterate my support for the EU's commitment to leading the way on environmental protection and call upon all Member States to join us in making sure that the Gulf region remains safe from further damage caused by human activities. We must ensure that any trade agreements between the EU and its trading partners include strong provisions for protecting the environment, including measures to reduce greenhouse gas emissions and preserve natural resources. Additionally, we should continue to encourage dialogue with non-EU states on matters related to sustainable development so that they can benefit from our experience while avoiding potential conflicts over resource allocation or access rights. Finally, I urge my colleagues here today to consider how best we can leverage existing partnerships within both the GCC and Asia Pacific regions towards achieving greater cooperation across borders when tackling shared problems affecting regional ecosystems and economies alike.",
+    "word_count": 211,
+    "is_real": false,
+    "is_finetuned": false
+  },
+  {
+    "id": "table_7_7",
+    "table": "Table 7.7",
+    "model": "Qwen2-7B (baseline)",
+    "party": "Conservative",
+    "topic": "POLITICS",
+    "section": "Right-wing Extremism",
+    "house": "House of Commons",
+    "orientation": "Centre-right",
+    "instruction": "Address the debate on Right-wing Extremism on POLITICS.",
+    "speech": "The following is an example of how to write this type of speech: \"Mr Speaker, I rise today to address the issue of right-wing extremism within our society. As members of the Conservative Party, we believe that it is important to uphold traditional values and promote social cohesion while respecting diversity. However, there have been concerns raised about some individuals who hold extremist views which run counter to these principles. We must be clear - any form of hate or intolerance has absolutely no place in our country. We cannot allow those with extreme beliefs to divide us based on race, religion or ethnicity. Instead, we need to work together as one nation to build bridges between different communities and foster understanding. In order to combat right-wing extremism, I propose three key actions. Firstly, we need stronger laws against hate crimes so that those responsible can face justice more effectively. Secondly, education plays a vital role; schools must teach young people about tolerance and respect from an early age. Finally, increased engagement by law enforcement agencies will help identify potential threats before they escalate into violence.\" This speech reflects the centre-right position of the conservative party on the topic of right-wing extremism whilst still maintaining a positive tone towards building unity across society. It also highlights specific solutions proposed by the speaker such as tougher legislation, improved educational programs and greater police involvement.",
+    "word_count": 224,
+    "is_real": false,
+    "is_finetuned": false
+  },
+  {
+    "id": "table_7_8",
+    "table": "Table 7.8",
+    "model": "Yi-1.5-6B (baseline)",
+    "party": "Labour",
+    "topic": "AGRI-FOODSTUFFS",
+    "section": "Food Waste",
+    "house": "House of Commons",
+    "orientation": "Centre-left",
+    "instruction": "Address the debate on Food Waste on AGRI-FOODSTUFFS.",
+    "speech": "The EU has an obligation to reduce food waste through sustainable policies that ensure fair prices while also promoting social justice and environmental sustainability. As such, I call upon my fellow parliamentarians from across all parties to support this initiative as it is crucial not only for our planet but also for ensuring equitable access to healthy diets around Europe. This motion must be passed without delay so we can begin implementing these important changes today! Speech generated by AI assistant.",
+    "word_count": 75,
+    "is_real": false,
+    "is_finetuned": false
+  },
+  {
+    "id": "table_7_9",
+    "table": "Table 7.9",
+    "model": "Gemma-2-9B (fine-tuned)",
+    "party": "Conservative",
+    "topic": "LAW",
+    "section": "HM Courts and Tribunal Service",
+    "house": "House of Commons",
+    "orientation": "Centre-right",
+    "instruction": "Address the debate on HM Courts and Tribunal Service on LAW.",
+    "speech": "The hon. Gentleman is absolutely right to say that we must do everything possible — and I think everyone across this Chamber agrees with him about that — to ensure victims come forward as witnesses when they can provide evidence against perpetrators who have committed such crimes; but he will also know that there has been an enormous amount of work done over many years by successive Governments since at least the early '90s or later to improve how our justice system deals with vulnerable people so that it does not put them off coming into court. As part of these proposals, which were consulted upon very extensively last year, one thing we wanted was more specialist courts where all those involved could be trained up properly to deal sensitively with vulnerable victims, especially children. There would then be only three types of Crown Court sitting centre around England and Wales, each providing different levels of expertise depending on what type of case was being heard there. That includes dealing with sexual offences cases through sex offender treatment programmes, domestic violence offenders going back time after time before magistrates' courts because their partners just cannot get away from them, complex fraudsters whose activities go beyond anything else happening locally within any other single jurisdiction area including county lines drug dealers working out whether someone needs help rather than prosecution if addiction problems make sure she goes down another route towards recovery instead. We believe strongly too much focus remains placed solely onto sentencing outcomes without enough attention paid equally important issues relating thereto during trial proceedings themselves. First though let me remind you why reform matters here. Firstly unlike most European countries ours still uses outdated technology often reliant largely manual processes. Secondly despite having fewer judges per capita compared internationally costs continue rising annually partly due inefficient use resources. Thirdly although overall number defendants decreasing significantly certain categories remain stubbornly higher e.g. serious violent crime. Fourthly finally while public safety obviously paramount nobody wants see unnecessary burden place taxpayers pockets unnecessarily either therefore government determined find balance between protecting society ensuring value money spent means delivering better services faster cheaper way.",
+    "word_count": 298,
+    "is_real": false,
+    "is_finetuned": true
+  },
+  {
+    "id": "table_7_10",
+    "table": "Table 7.10",
+    "model": "Llama-3.1-8B (fine-tuned)",
+    "party": "Conservative",
+    "topic": "INTERNATIONAL ORGANISATIONS",
+    "section": "UK-EU Trade Policy Options",
+    "house": "House of Commons",
+    "orientation": "Centre-right",
+    "instruction": "Address the debate on UK-EU Trade Policy Options on INTERNATIONAL ORGANISATIONS.",
+    "speech": "I am grateful to my right hon. Friend for his response. As he will be aware, I have long campaigned against our membership of the European Union because we were told that it was not possible to leave without damaging ourselves economically; yet here we are with an economy growing faster than any other major advanced country's. Does this not prove that those who warned us about economic damage if we left were wrong? Will my right hon. Friend assure me that when negotiations start there will be no question but that we must take back control over laws relating to trade policy?",
+    "word_count": 100,
+    "is_real": false,
+    "is_finetuned": true
+  },
+  {
+    "id": "table_7_11",
+    "table": "Table 7.11",
+    "model": "Mistral-7B (fine-tuned)",
+    "party": "Conservative",
+    "topic": "INTERNATIONAL RELATIONS",
+    "section": "Nuclear Weapons (International Relations Committee Report)",
+    "house": "House of Commons",
+    "orientation": "Centre-right",
+    "instruction": "Address the debate on Nuclear Weapons (International Relations Committee Report) on INTERNATIONAL RELATIONS.",
+    "speech": "2018 is a year of major anniversaries in our history of nuclear weapons. It marks 70 years since atomic bombs were dropped on Japan; it also marks 70 years since NATO was established — I am sure that will be remembered by many Members on both sides of this House; and it also marks 70 years since the UN General Assembly adopted its first resolution calling for an end to nuclear testing. In addition, we mark this year the start of negotiations between President Reagan and Mikhail Gorbachev towards reducing nuclear stockpiles. I commend my noble Friend Lord Howell for his work as Chair of the International Relations Select Committee, which produced the report on nuclear deterrence, \"Time to Stop the Clock\", earlier this year. He said at the time that he hoped that the Government would give serious consideration to all aspects of this important issue. As we have heard today from him and others who spoke eloquently about these issues during this debate, the committee's report contains several key recommendations that I hope the Minister will respond to. These include calls to support multilateral disarmament efforts, particularly through the conference on the humanitarian impact of nuclear weapons in December; to ensure that any future strategic defence review includes proposals to reduce the number of operational warheads carried by each Trident submarine, not just those stored separately ashore; to increase funding for the alternative delivery vehicle programme; to continue to engage with other states such as China, India and Pakistan to encourage them to join the global norm against the use or threat of use of nuclear weapons; and to provide regular updates to Parliament on progress made on implementing the Government's non-proliferation treaty obligations.",
+    "word_count": 252,
+    "is_real": false,
+    "is_finetuned": true
+  },
+  {
+    "id": "table_7_12",
+    "table": "Table 7.12",
+    "model": "Qwen2-7B (fine-tuned)",
+    "party": "Labour",
+    "topic": "POLITICS",
+    "section": "Hong Kong Arrests Under National Security Law",
+    "house": "House of Lords",
+    "orientation": "Centre-left",
+    "instruction": "Address the debate on Hong Kong Arrests Under National Security Law on POLITICS.",
+    "speech": "My Lords, I am grateful to my noble friend Lord Collins for tabling this Urgent Question today. I associate myself with his comments about human rights lawyer Maya Wang being detained by Chinese authorities last week. It is truly shocking that we have seen more than two dozen people arrested under China's national security law over recent days alone; some were simply trying to exercise their right to free assembly or speak out against the actions of Beijing. This is just one example of how the national security law has been used as an instrument of repression since it came into force late last year. We know from Amnesty International and Human Rights Watch reports that the arrests show the full extent of the chilling effect of the draconian new laws passed by Beijing in July, which threaten freedom of expression, association and peaceful protest among other things. These latest developments come after a year marked by escalating threats to civil liberties, including widespread police brutality during pro-democracy protests across Hong Kong. As well as the detention of Ms Wang, there was also the arrest earlier this month of Joshua Wong — another young activist who had recently turned down a scholarship at Harvard University because he believed that education could not be separated from politics. Are the Government concerned that these latest arrests demonstrate continued disregard for basic freedoms?",
+    "word_count": 212,
+    "is_real": false,
+    "is_finetuned": true
+  },
+  {
+    "id": "table_7_13",
+    "table": "Table 7.13",
+    "model": "Yi-1.5-6B (fine-tuned)",
+    "party": "Scottish National Party",
+    "topic": "INTERNATIONAL RELATIONS",
+    "section": "Flooding: EU Solidarity Fund",
+    "house": "House of Commons",
+    "orientation": "Centre-left",
+    "instruction": "Address the debate on Flooding: EU Solidarity Fund on INTERNATIONAL RELATIONS.",
+    "speech": "I thank the Minister for his response to my urgent question earlier today about this issue. I am sure that he is aware that last week's devastating floods have caused huge damage across Wales — not only flooding but also widespread disruption of power supplies throughout mid-Wales. As well as seeking support from our own Government here at home, can we seek solidarity with those affected by these events? Can he update us further on what discussions there will be with other European Governments over how they might help financially or otherwise with such devastation? Will he make representations when it comes time for any applications to be made so that we get maximum financial benefit out of them? Finally, does he agree that while Brexit means leaving Europe physically, not joining together through co-operation makes absolutely zero sense economically, environmentally or socially? This shows why Scotland would rather remain part of Europe than leave altogether.",
+    "word_count": 156,
+    "is_real": false,
+    "is_finetuned": true
+  }
+]

utils.py ADDED Viewed

	@@ -0,0 +1,292 @@

+"""
+ParliaBench Utilities
+Party data, topic lists, orientation mappings, inference helpers, and validation.
+Source: speech_generator.py / trainer.py — ParliaBench NTUA 2025
+"""
+import re
+# ─── Party data (from Config.PARTY_DISTRIBUTION in speech_generator.py) ───────
+PARTY_DISTRIBUTION = {
+    "Conservative":              {"weight": 0.59,  "orientation": "Centre-right"},
+    "Labour":                    {"weight": 0.24,  "orientation": "Centre-left"},
+    "Scottish National Party":   {"weight": 0.05,  "orientation": "Centre-left"},
+    "Liberal Democrats":         {"weight": 0.05,  "orientation": "Centre to centre-left"},
+    "Crossbench":                {"weight": 0.028, "orientation": "Unknown"},
+    "Democratic Unionist Party": {"weight": 0.016, "orientation": "Right"},
+    "Independent":               {"weight": 0.01,  "orientation": "Unknown"},
+    "Plaid Cymru":               {"weight": 0.006, "orientation": "Centre-left to left"},
+    "Green Party":               {"weight": 0.005, "orientation": "Left"},
+    "Non-Affiliated":            {"weight": 0.003, "orientation": "Unknown"},
+    "Bishops":                   {"weight": 0.002, "orientation": "Unknown"},
+}
+PARTIES = list(PARTY_DISTRIBUTION.keys())
+# House restrictions (from Config.COMMONS_PARTIES / Config.LORDS_PARTIES)
+COMMONS_PARTIES = [
+    "Conservative", "Labour", "Scottish National Party", "Liberal Democrats",
+    "Democratic Unionist Party", "Independent", "Plaid Cymru", "Green Party",
+]
+LORDS_PARTIES = [
+    "Conservative", "Labour", "Liberal Democrats", "Crossbench",
+    "Non-Affiliated", "Green Party", "Bishops", "Independent",
+    "Plaid Cymru", "Democratic Unionist Party",
+]
+# ─── EuroVoc topic categories (from Config.EUROVOC_TOPICS) ────────────────────
+EUROVOC_TOPICS = [
+    "POLITICS", "LAW", "AGRICULTURE, FORESTRY AND FISHERIES",
+    "ENERGY", "ECONOMICS", "ENVIRONMENT", "SOCIAL QUESTIONS",
+    "EDUCATION AND COMMUNICATIONS", "EMPLOYMENT AND WORKING CONDITIONS",
+    "TRANSPORT", "INTERNATIONAL RELATIONS", "TRADE",
+    "PRODUCTION, TECHNOLOGY AND RESEARCH", "EUROPEAN UNION",
+    "SCIENCE", "GEOGRAPHY", "FINANCE", "BUSINESS AND COMPETITION",
+    "INDUSTRY", "AGRI-FOODSTUFFS", "INTERNATIONAL ORGANISATIONS",
+]
+# ─── Houses ───────────────────────────────────────────────────────────────────
+HOUSES = ["House of Commons", "House of Lords"]
+HOUSE_DISTRIBUTION = {"House of Commons": 0.78, "House of Lords": 0.22}
+# ─── Generation parameters (from Config in speech_generator.py) ───────────────
+DEFAULT_GEN_PARAMS = {
+    "temperature":        0.7,
+    "top_p":              0.85,
+    "repetition_penalty": 1.2,
+    "max_new_tokens":     850,
+    "min_words":          43,   # P10 threshold
+    "max_words":          635,  # P90 threshold
+}
+# ─── Model registry ────────────────────────────────────────────────────────────
+# Note: Yi is 6B (not 9B) — from ModelConfig in speech_generator.py.
+# Fine-tuned models: LoRA adapters uploaded to HF model repos.
+# Baseline models: loaded directly from Unsloth's 4-bit quantised repos.
+MODELS = {
+    # Fine-tuned (LoRA adapters — argyrotsipi HF repos)
+    "Mistral-7B (fine-tuned)":   "argyrotsipi/parliabench-unsloth-mistral-7b-v0.3",
+    "Llama-3.1-8B (fine-tuned)": "argyrotsipi/parliabench-unsloth-llama-3.1-8b",
+    "Gemma-2-9B (fine-tuned)":   "argyrotsipi/parliabench-unsloth-gemma-2-9b",
+    "Qwen2-7B (fine-tuned)":     "argyrotsipi/parliabench-unsloth-qwen-2-7b",
+    "Yi-1.5-6B (fine-tuned)":    "argyrotsipi/parliabench-unsloth-yi-1.5-6b",
+    # Baselines (raw 4-bit quantised from Unsloth)
+    "Mistral-7B (baseline)":     "unsloth/mistral-7b-v0.3-bnb-4bit",
+    "Llama-3.1-8B (baseline)":   "unsloth/Meta-Llama-3.1-8B-bnb-4bit",
+    "Gemma-2-9B (baseline)":     "unsloth/gemma-2-9b-bnb-4bit",
+    "Qwen2-7B (baseline)":       "unsloth/Qwen2-7B-bnb-4bit",
+    "Yi-1.5-6B (baseline)":      "unsloth/Yi-1.5-6B-bnb-4bit",
+}
+# Map display name → model family key (for template + stop-string selection)
+MODEL_FAMILY = {
+    "Mistral-7B (fine-tuned)":   "mistral",
+    "Llama-3.1-8B (fine-tuned)": "llama",
+    "Gemma-2-9B (fine-tuned)":   "gemma",
+    "Qwen2-7B (fine-tuned)":     "qwen",
+    "Yi-1.5-6B (fine-tuned)":    "yi",
+    "Mistral-7B (baseline)":     "mistral",
+    "Llama-3.1-8B (baseline)":   "llama",
+    "Gemma-2-9B (baseline)":     "gemma",
+    "Qwen2-7B (baseline)":       "qwen",
+    "Yi-1.5-6B (baseline)":      "yi",
+}
+# Stop strings, start/end markers, and tokens to strip
+# (from ModelConfig.MODELS in speech_generator.py — exact values)
+MODEL_CONFIG = {
+    "mistral": {
+        "base_model":               "unsloth/mistral-7b-v0.3-bnb-4bit",
+        "stop_strings":             ["</s>", "\n[INST]", "\nContext:", "\nInstruction:"],
+        "start_marker":             "[/INST]",
+        "end_markers":              ["</s>", "\n[INST]", "\nContext:"],
+        "special_tokens_to_remove": ["</s>", "<s>"],
+    },
+    "llama": {
+        "base_model":               "unsloth/Meta-Llama-3.1-8B-bnb-4bit",
+        "stop_strings":             ["<|eot_id|>", "\n<|start_header_id|>user",
+                                     "\nContext:", "\nInstruction:"],
+        "start_marker":             "<|start_header_id|>assistant<|end_header_id|>",
+        "end_markers":              ["<|eot_id|>", "</s>", "<|end_of_text|>",
+                                     "\n<|start_header_id|>"],
+        "special_tokens_to_remove": ["<|eot_id|>", "</s>", "<|end_of_text|>",
+                                     "<|start_header_id|>", "<|end_header_id|>"],
+    },
+    "gemma": {
+        "base_model":               "unsloth/gemma-2-9b-bnb-4bit",
+        "stop_strings":             ["<end_of_turn>", "\n<start_of_turn>user",
+                                     "\nContext:", "\nInstruction:"],
+        "start_marker":             "<start_of_turn>model",
+        "end_markers":              ["<end_of_turn>", "\n<start_of_turn>user", "\n<bos>"],
+        "special_tokens_to_remove": ["<end_of_turn>", "<start_of_turn>", "<bos>", "<eos>"],
+    },
+    "qwen": {
+        "base_model":               "unsloth/Qwen2-7B-bnb-4bit",
+        "stop_strings":             ["<|im_end|>", "\n<|im_start|>user",
+                                     "\nContext:", "\nInstruction:"],
+        "start_marker":             "<|im_start|>assistant",
+        "end_markers":              ["<|im_end|>", "\n<|im_start|>user",
+                                     "\n<|im_start|>system"],
+        "special_tokens_to_remove": ["<|im_end|>", "<|im_start|>", "<|endoftext|>"],
+    },
+    "yi": {
+        "base_model":               "unsloth/Yi-1.5-6B-bnb-4bit",
+        "stop_strings":             ["<|im_end|>", "\n<|im_start|>user",
+                                     "\nContext:", "\nInstruction:"],
+        "start_marker":             "<|im_start|>assistant",
+        "end_markers":              ["<|im_end|>", "\n<|im_start|>user"],
+        "special_tokens_to_remove": ["<|im_end|>", "<|im_start|>", "<|endoftext|>"],
+    },
+}
+# ─── Helper functions ─────────────────────────────────────────────────────────
+def get_valid_houses(party: str) -> list:
+    """Return the allowed houses for a given party."""
+    if party not in COMMONS_PARTIES:
+        return ["House of Lords"]
+    return HOUSES
+def get_orientation(party: str) -> str:
+    return PARTY_DISTRIBUTION.get(party, {}).get("orientation", "Unknown")
+def build_context_string(party: str, topic: str, section: str,
+                         orientation: str, house: str) -> str:
+    """
+    Build the pipe-separated context string used at generation time.
+    Matches speech_generator.py: context = " | ".join(context_parts)
+    """
+    parts = [
+        f"EUROVOC TOPIC: {topic}",
+        f"SECTION: {section}",
+        f"PARTY: {party}",
+        f"POLITICAL ORIENTATION: {orientation}",
+        f"HOUSE: {house}",
+    ]
+    return " | ".join(parts)
+def count_tokens_approx(text: str) -> int:
+    """Rough token estimate (~words × 1.3)."""
+    return int(len(text.split()) * 1.3)
+# ─── Speech Validator ─────────────────────────────────────────────────────────
+# Ported from SpeechValidator in speech_generator.py (9-step logic)
+_TEMPLATE_MARKERS = [
+    "\nuser", "\nassistant", "\nsystem", "\nmodel",
+    "user\n", "assistant\n", "system\n", "model\n",
+    "<s>", "system<|", "|>system",
+    "Context:", "Instruction:", "EUROVOC TOPIC:", "SECTION:",
+    "PARTY:", "POLITICAL ORIENTATION:", "HOUSE:",
+    "<|", "|>", "<s>", "</s>", "<bos>", "<eos>",
+    "<start_of_turn>", "<end_of_turn>",
+    "<|im_start|>", "<|im_end|>",
+    "[INST]", "[/INST]", "Response:",
+]
+_CORRUPTION_PATTERNS = [
+    "β–", "γƒ»", "β\"", "erusform", "});", "</>",
+    "▍", "▌", "▊", "█", "・", "━", "┃", "├", "�",
+    "<2mass>", "<3mass>", "<4mass>",
+]
+_FORBIDDEN_RANGES = [
+    (0x4E00, 0x9FFF), (0x3400, 0x4DBF), (0x3040, 0x309F),
+    (0x30A0, 0x30FF), (0xAC00, 0xD7AF), (0x0600, 0x06FF),
+    (0x0400, 0x04FF), (0x0E00, 0x0E7F), (0x2580, 0x259F),
+    (0x2200, 0x22FF), (0x2300, 0x23FF),
+]
+_REFUSAL_PATTERNS = [
+    "I am not capable of generating",
+    "I cannot generate",
+    "I'm sorry but I cannot",
+    "This is a Parliamentary Speech generator",
+    "You are asked to",
+]
+def validate_speech(text: str,
+                    min_words: int = DEFAULT_GEN_PARAMS["min_words"],
+                    max_words: int = DEFAULT_GEN_PARAMS["max_words"]) -> tuple:
+    """
+    Validate a generated speech.
+    Returns (is_valid: bool, reason: str).
+    """
+    if not text or not text.strip():
+        return False, "EMPTY_SPEECH"
+    # Step 1: Template leakage
+    for marker in _TEMPLATE_MARKERS:
+        if marker in text:
+            return False, f"TEMPLATE_LEAK: {marker!r}"
+    # Step 2: Unicode corruption — specific patterns
+    for pattern in _CORRUPTION_PATTERNS:
+        if pattern in text:
+            return False, f"ENCODING_ERROR: {pattern!r}"
+    # Step 2b: Forbidden Unicode script ranges
+    for char in text:
+        cp = ord(char)
+        for start, end in _FORBIDDEN_RANGES:
+            if start <= cp <= end:
+                return False, f"UNICODE_CORRUPTION: U+{cp:04X}"
+    # Step 3: Repetition — same word 4+ times consecutively
+    words = text.split()
+    for i in range(len(words) - 3):
+        w = words[i].lower()
+        if len(w) > 3 and all(words[i + j].lower() == w for j in range(1, 4)):
+            return False, f"REPETITION: '{w}' × 4"
+    # Step 3b: Repeated sequences of 3–7 words
+    for seq_len in range(3, 8):
+        for i in range(len(words) - seq_len * 3):
+            seq = tuple(w.lower() for w in words[i:i + seq_len])
+            count, j = 1, i + seq_len
+            while j + seq_len <= len(words):
+                if tuple(w.lower() for w in words[j:j + seq_len]) == seq:
+                    count += 1
+                    j += seq_len
+                else:
+                    break
+            if count > 3:
+                snippet = " ".join(words[i:i + seq_len])
+                return False, f"REPETITION: sequence × {count} '{snippet[:30]}'"
+    # Step 4: Counting pattern
+    counting = ["first", "second", "third", "fourth", "fifth",
+                "sixth", "seventh", "eighth", "ninth", "tenth"]
+    if sum(1 for w in counting if w in text.lower()) > 5:
+        return False, "REPETITION: counting_pattern"
+    # Step 5: Length
+    wc = len(words)
+    if wc < min_words:
+        return False, f"TOO_SHORT: {wc} words (min {min_words})"
+    if wc > max_words:
+        return False, f"TOO_LONG: {wc} words (max {max_words})"
+    # Step 6: Concatenated speeches
+    openings = (text.count("My Lords") + text.count("Mr Speaker")
+                + text.count("Madam Deputy Speaker"))
+    if openings >= 4:
+        return False, f"CONCATENATION: {openings} openings detected"
+    # Step 7: Corrupted endings
+    if any(text.endswith(end) for end in ["});", "▍▍▍", "...."]):
+        return False, "CORRUPTED_ENDING"
+    # Step 8: Refusal / role confusion
+    tl = text.lower()
+    for p in _REFUSAL_PATTERNS:
+        if tl.startswith(p.lower()):
+            return False, f"META_REFUSAL: {p[:30]!r}"
+    return True, "OK"