XLMR-large-qa-council: Boundary Detection in Municipal Meeting Minutes

Model Description

This model performs extractive Question Answering (QA) to detect structural segments in Portuguese municipal meeting minutes, namely the Opening, Body, and Closing sections.

Given the full text of a meeting minute and a predefined question targeting a specific segment, the model predicts the most relevant text span using character-level start and end offsets.
It follows the SQuAD v2 paradigm, allowing the model to explicitly return no answer when a segment is not present in the document.

The model is designed to operate on long, unstructured administrative texts and is typically used as a preprocessing step for downstream tasks such as metadata extraction.

Key Features

🏛️ Specialized for Municipal Minutes
Fine-tuned on Portuguese municipal council meeting minutes, capturing the structural patterns of administrative documents.
🧩 Extractive Question Answering
Predicts precise start and end offsets for Opening, Body, and Closing segments using a span-based QA formulation.
⚙️ Transformer-based Architecture
Built on a pre-trained transformer model and adapted to handle long, unstructured texts through window-based inference.
📈 Robust QA Performance
Achieves strong F1 scores on a held-out Portuguese test set, demonstrating reliable segment detection across municipalities.

Model Details

Base Model: deepset/xlm-roberta-large-squad2
Architecture: Transformer encoder with a span prediction head for extractive Question Answering
Parameters: ~550M
Maximum Sequence Length: 512 tokens
Fine-tuning Dataset: 120 Portuguese municipal meeting minutes from 6 different municipalities
Answer Types: opening, body and closing (no-answer cases following the SQuAD v2 formulation)
Training Framework: PyTorch with Hugging Face Transformers
Evaluation Metrics: Exact Match (EM), F1 score and Boundary Accuracy, following the SQuAD v2 evaluation protocol

How It Works

The model follows a standard extractive Question Answering pipeline.

Given a question targeting a specific structural segment (e.g., Opening or Body) and the full text of a meeting minute as context, both inputs are jointly tokenized and passed to the transformer model. The model predicts start and end logits for each token in the sequence, corresponding to the most likely answer span.

For long documents exceeding the maximum sequence length, the context is split into overlapping windows. Each window is processed independently, and the final answer is selected based on the highest scoring span across all windows, while also considering the model’s no-answer (null) score in accordance with the SQuAD v2 protocol.

The following example illustrates how to perform inference using this model:

import os
import torch
import numpy as np
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import json

MODEL_PATH = model_path     # Here we put the model path

# Questions that are done during training
QUESTIONS = [
    "Qual é a última frase da introdução da ata, antes do início da ordem do dia?",
    "Qual é a primeira frase que marca o início do período da ordem do dia?",
    "Qual é a última frase que encerra o período da ordem do dia?",
    "Qual é a primeira frase que indica o fecho da ata, após o término da ordem do dia?",
    "Qual é a primeira frase do segmento das assinaturas no final da ata?"
]

MAX_SEQ_LENGTH = 512        
DOC_STRIDE = 128            
MAX_ANSWER_LENGTH = 500   


def infer(model, tokenizer, context, question):

    # Tokenize sliding window
    inputs = tokenizer(
        question,
        context,
        truncation="only_second",
        max_length=MAX_SEQ_LENGTH,
        stride=DOC_STRIDE,
        return_overflowing_tokens=True,
        return_offsets_mapping=True,
        padding="max_length",
        return_tensors="pt"
    )

    overflow_mapping = inputs.pop("overflow_to_sample_mapping", None)
    offset_mapping = inputs.pop("offset_mapping")

    all_start_logits = []
    all_end_logits = []

    with torch.no_grad():
        for i in range(inputs["input_ids"].shape[0]):
            features = {k: v[i].unsqueeze(0).to(model.device) for k, v in inputs.items()}
            outputs = model(**features)
            all_start_logits.append(outputs.start_logits.cpu().numpy())
            all_end_logits.append(outputs.end_logits.cpu().numpy())

    all_start_logits = np.concatenate(all_start_logits, axis=0)
    all_end_logits = np.concatenate(all_end_logits, axis=0)


    null_scores = []
    for i in range(len(all_start_logits)):
        null_score = float(all_start_logits[i][0] + all_end_logits[i][0])
        null_scores.append(null_score)

    best_window = int(np.argmin(null_scores))

    start_logit = all_start_logits[best_window]
    end_logit = all_end_logits[best_window]
    offsets = offset_mapping[best_window]

    n_best_size = 50            # We choose the top 50, but you can adjust
    prelim_predictions = []

    start_indexes = np.argsort(start_logit)[-n_best_size:].tolist()
    end_indexes = np.argsort(end_logit)[-n_best_size:].tolist()

    for start_idx in start_indexes:
        for end_idx in end_indexes:

            if start_idx >= len(offsets) or end_idx >= len(offsets):
                continue
            if offsets[start_idx] is None or offsets[end_idx] is None:
                continue
            if end_idx < start_idx:
                continue
            if end_idx - start_idx + 1 > MAX_ANSWER_LENGTH:
                continue

            start_char = offsets[start_idx][0]
            end_char = offsets[end_idx][1]
            text = context[start_char:end_char]

            score = start_logit[start_idx] + end_logit[end_idx]

            prelim_predictions.append({
                "score": score,
                "text": text,
                "start": start_char,
                "end": end_char
            })

    if not prelim_predictions:
        return {"text": "", "start": -1, "end": -1}

    best_pred = sorted(prelim_predictions, key=lambda x: x["score"], reverse=True)[0]

    return {
        "text": best_pred["text"].strip(),
        "start": best_pred["start"],
        "end": best_pred["end"]
    }

if __name__ == "__main__":
    # Path for the txt file
    FILE_PATH = "test.txt"

    with open(FILE_PATH, "r", encoding="utf-8") as f:
        context = f.read()

    # Or we can put the string directly

    # context = "Minute number (...)"

    tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, use_fast=True)
    model = AutoModelForQuestionAnswering.from_pretrained(MODEL_PATH)
    model.eval()
    model.to("cuda" if torch.cuda.is_available() else "cpu")

    results = []
    for q in QUESTIONS:
        pred = infer(model, tokenizer, context, q)
        print(f"\n❓ {q}\n➡️ {pred['text']}")
        print(f"[Offsets: {pred['start']}–{pred['end']}]")
        results.append({
            "question": q,
            "answer": pred["text"],
            "start": pred["start"],
            "end": pred["end"]
        })

    def segmentar_ata(context, results, municipio=None):

        offsets = {r["question"]: (r["start"], r["end"]) for r in results}

        inicio_intro = 0
        fim_intro = offsets[QUESTIONS[0]][1]
        inicio_ordem = offsets[QUESTIONS[1]][0]
        fim_ordem = offsets[QUESTIONS[2]][1]
        inicio_fecho = offsets[QUESTIONS[3]][0]
        inicio_assinaturas = offsets[QUESTIONS[4]][0]

        introducao = context[inicio_intro:fim_intro].strip()
        corpo_ata = context[inicio_ordem:fim_ordem].strip()

        segmentos = {
            "intro": introducao,
            "body": corpo_ata
        }

        if formato == 2 or formato == 3:
            if formato == 3:
                conclusao = context[inicio_fecho:inicio_assinaturas].strip()
            else:
                conclusao = context[inicio_fecho:].strip()

            segmentos["conclusion"] = conclusao

        return segmentos

    resp4 = results[3]["answer"].strip()
    resp5 = results[4]["answer"].strip()

    # Here there are different formats for each municipality (it detects automatically based on the answers)

    if not resp4 and not resp5:
        formato = 1
    elif resp4 and not resp5:
        formato = 2
    elif resp4 and resp5:
        formato = 3
    else:
        formato = 0

    segmentos = segmentar_ata(context, results, formato)

    output_seg_path = "test.json"
    with open(output_seg_path, "w", encoding="utf-8") as f:
        json.dump(segmentos, f, indent=2, ensure_ascii=False)

Evaluation Results

Municipal Meeting Minutes Test Set

Metric	Score
F1 score	0.88
Exact Match	0.81
Boundary Accuracy	0.90

Limitations

Domain Specificity
The model is fine-tuned on Portuguese municipal meeting minutes and performs best on administrative and governmental texts. Performance may degrade on documents with substantially different structure or writing style.
Language Dependency
Although based on a multilingual pre-trained model, the fine-tuning data is exclusively in Portuguese. As a result, performance on other languages has not been validated and is not guaranteed.
Context Window Length
The model has a maximum input length of 512 tokens. Longer documents require window-based processing, which may lead to partial or fragmented segment predictions in edge cases.
Structural Variability
Municipal minutes can vary significantly across municipalities and time periods. Unseen formatting patterns or atypical section ordering may reduce prediction accuracy.

License

This model is released under the cc-by-nc-nd-4.0 license.

Downloads last month: 5

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for liaad/Citilink_XLMR-large_Structural-Segmentation

Base model

deepset/xlm-roberta-large-squad2

Finetuned

(10)

this model

Collection including liaad/Citilink_XLMR-large_Structural-Segmentation

Citilink

Collection

Citilink aims to create AI models to facilitate the understanding of city council meetings • 13 items • Updated 8 days ago