XLMR-large-qa-council: Boundary Detection in Municipal Meeting Minutes
Model Description
This model performs extractive Question Answering (QA) to detect structural segments in Portuguese municipal meeting minutes, namely the Opening, Body, and Closing sections.
Given the full text of a meeting minute and a predefined question targeting a specific segment, the model predicts the most relevant text span using character-level start and end offsets.
It follows the SQuAD v2 paradigm, allowing the model to explicitly return no answer when a segment is not present in the document.
The model is designed to operate on long, unstructured administrative texts and is typically used as a preprocessing step for downstream tasks such as metadata extraction.
Key Features
🏛️ Specialized for Municipal Minutes
Fine-tuned on Portuguese municipal council meeting minutes, capturing the structural patterns of administrative documents.🧩 Extractive Question Answering
Predicts precise start and end offsets for Opening, Body, and Closing segments using a span-based QA formulation.⚙️ Transformer-based Architecture
Built on a pre-trained transformer model and adapted to handle long, unstructured texts through window-based inference.📈 Robust QA Performance
Achieves strong F1 scores on a held-out Portuguese test set, demonstrating reliable segment detection across municipalities.
Model Details
- Base Model:
deepset/xlm-roberta-large-squad2 - Architecture: Transformer encoder with a span prediction head for extractive Question Answering
- Parameters: ~550M
- Maximum Sequence Length: 512 tokens
- Fine-tuning Dataset: 120 Portuguese municipal meeting minutes from 6 different municipalities
- Answer Types:
opening,bodyandclosing(no-answer cases following the SQuAD v2 formulation) - Training Framework: PyTorch with Hugging Face Transformers
- Evaluation Metrics: Exact Match (EM), F1 score and Boundary Accuracy, following the SQuAD v2 evaluation protocol
How It Works
The model follows a standard extractive Question Answering pipeline.
Given a question targeting a specific structural segment (e.g., Opening or Body) and the full text of a meeting minute as context, both inputs are jointly tokenized and passed to the transformer model. The model predicts start and end logits for each token in the sequence, corresponding to the most likely answer span.
For long documents exceeding the maximum sequence length, the context is split into overlapping windows. Each window is processed independently, and the final answer is selected based on the highest scoring span across all windows, while also considering the model’s no-answer (null) score in accordance with the SQuAD v2 protocol.
The following example illustrates how to perform inference using this model:
import os
import torch
import numpy as np
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import json
MODEL_PATH = model_path # Here we put the model path
# Questions that are done during training
QUESTIONS = [
"Qual é a última frase da introdução da ata, antes do início da ordem do dia?",
"Qual é a primeira frase que marca o início do período da ordem do dia?",
"Qual é a última frase que encerra o período da ordem do dia?",
"Qual é a primeira frase que indica o fecho da ata, após o término da ordem do dia?",
"Qual é a primeira frase do segmento das assinaturas no final da ata?"
]
MAX_SEQ_LENGTH = 512
DOC_STRIDE = 128
MAX_ANSWER_LENGTH = 500
def infer(model, tokenizer, context, question):
# Tokenize sliding window
inputs = tokenizer(
question,
context,
truncation="only_second",
max_length=MAX_SEQ_LENGTH,
stride=DOC_STRIDE,
return_overflowing_tokens=True,
return_offsets_mapping=True,
padding="max_length",
return_tensors="pt"
)
overflow_mapping = inputs.pop("overflow_to_sample_mapping", None)
offset_mapping = inputs.pop("offset_mapping")
all_start_logits = []
all_end_logits = []
with torch.no_grad():
for i in range(inputs["input_ids"].shape[0]):
features = {k: v[i].unsqueeze(0).to(model.device) for k, v in inputs.items()}
outputs = model(**features)
all_start_logits.append(outputs.start_logits.cpu().numpy())
all_end_logits.append(outputs.end_logits.cpu().numpy())
all_start_logits = np.concatenate(all_start_logits, axis=0)
all_end_logits = np.concatenate(all_end_logits, axis=0)
null_scores = []
for i in range(len(all_start_logits)):
null_score = float(all_start_logits[i][0] + all_end_logits[i][0])
null_scores.append(null_score)
best_window = int(np.argmin(null_scores))
start_logit = all_start_logits[best_window]
end_logit = all_end_logits[best_window]
offsets = offset_mapping[best_window]
n_best_size = 50 # We choose the top 50, but you can adjust
prelim_predictions = []
start_indexes = np.argsort(start_logit)[-n_best_size:].tolist()
end_indexes = np.argsort(end_logit)[-n_best_size:].tolist()
for start_idx in start_indexes:
for end_idx in end_indexes:
if start_idx >= len(offsets) or end_idx >= len(offsets):
continue
if offsets[start_idx] is None or offsets[end_idx] is None:
continue
if end_idx < start_idx:
continue
if end_idx - start_idx + 1 > MAX_ANSWER_LENGTH:
continue
start_char = offsets[start_idx][0]
end_char = offsets[end_idx][1]
text = context[start_char:end_char]
score = start_logit[start_idx] + end_logit[end_idx]
prelim_predictions.append({
"score": score,
"text": text,
"start": start_char,
"end": end_char
})
if not prelim_predictions:
return {"text": "", "start": -1, "end": -1}
best_pred = sorted(prelim_predictions, key=lambda x: x["score"], reverse=True)[0]
return {
"text": best_pred["text"].strip(),
"start": best_pred["start"],
"end": best_pred["end"]
}
if __name__ == "__main__":
# Path for the txt file
FILE_PATH = "test.txt"
with open(FILE_PATH, "r", encoding="utf-8") as f:
context = f.read()
# Or we can put the string directly
# context = "Minute number (...)"
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, use_fast=True)
model = AutoModelForQuestionAnswering.from_pretrained(MODEL_PATH)
model.eval()
model.to("cuda" if torch.cuda.is_available() else "cpu")
results = []
for q in QUESTIONS:
pred = infer(model, tokenizer, context, q)
print(f"\n❓ {q}\n➡️ {pred['text']}")
print(f"[Offsets: {pred['start']}–{pred['end']}]")
results.append({
"question": q,
"answer": pred["text"],
"start": pred["start"],
"end": pred["end"]
})
def segmentar_ata(context, results, municipio=None):
offsets = {r["question"]: (r["start"], r["end"]) for r in results}
inicio_intro = 0
fim_intro = offsets[QUESTIONS[0]][1]
inicio_ordem = offsets[QUESTIONS[1]][0]
fim_ordem = offsets[QUESTIONS[2]][1]
inicio_fecho = offsets[QUESTIONS[3]][0]
inicio_assinaturas = offsets[QUESTIONS[4]][0]
introducao = context[inicio_intro:fim_intro].strip()
corpo_ata = context[inicio_ordem:fim_ordem].strip()
segmentos = {
"intro": introducao,
"body": corpo_ata
}
if formato == 2 or formato == 3:
if formato == 3:
conclusao = context[inicio_fecho:inicio_assinaturas].strip()
else:
conclusao = context[inicio_fecho:].strip()
segmentos["conclusion"] = conclusao
return segmentos
resp4 = results[3]["answer"].strip()
resp5 = results[4]["answer"].strip()
# Here there are different formats for each municipality (it detects automatically based on the answers)
if not resp4 and not resp5:
formato = 1
elif resp4 and not resp5:
formato = 2
elif resp4 and resp5:
formato = 3
else:
formato = 0
segmentos = segmentar_ata(context, results, formato)
output_seg_path = "test.json"
with open(output_seg_path, "w", encoding="utf-8") as f:
json.dump(segmentos, f, indent=2, ensure_ascii=False)
Evaluation Results
Municipal Meeting Minutes Test Set
| Metric | Score |
|---|---|
| F1 score | 0.88 |
| Exact Match | 0.81 |
| Boundary Accuracy | 0.90 |
Limitations
Domain Specificity
The model is fine-tuned on Portuguese municipal meeting minutes and performs best on administrative and governmental texts. Performance may degrade on documents with substantially different structure or writing style.Language Dependency
Although based on a multilingual pre-trained model, the fine-tuning data is exclusively in Portuguese. As a result, performance on other languages has not been validated and is not guaranteed.Context Window Length
The model has a maximum input length of 512 tokens. Longer documents require window-based processing, which may lead to partial or fragmented segment predictions in edge cases.Structural Variability
Municipal minutes can vary significantly across municipalities and time periods. Unseen formatting patterns or atypical section ordering may reduce prediction accuracy.
License
This model is released under the cc-by-nc-nd-4.0 license.
- Downloads last month
- 5
Model tree for liaad/Citilink_XLMR-large_Structural-Segmentation
Base model
deepset/xlm-roberta-large-squad2