File size: 5,319 Bytes
851b7f9 a1c590b 851b7f9 a1c590b 851b7f9 a2fe6e9 851b7f9 a2fe6e9 851b7f9 a1c590b 851b7f9 e0bcd66 851b7f9 e0bcd66 851b7f9 a2fe6e9 851b7f9 a2fe6e9 851b7f9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | ---
language:
- pt
license: cc-by-nc-nd-4.0
colorTo: blue
tags:
- text-summarization
- abstractive-summarization
- portuguese
- administrative-documents
- municipal-meetings
- bart
library_name: transformers
base_model:
- facebook/bart-base
---
# Bart-Base-Summarization-Council-PT: Abstractive Summarization of Portuguese Municipal Meeting Minutes Discussion Subjects
## Model Description
**Bart-Base-Summarization-Council-PT** is an **abstractive text summarization model** based on **BART-base**, fine-tuned to produce concise and informative summaries of discussion subjects from **Portuguese municipal meeting minutes**.
The model was trained on a curated and annotated corpus of official municipal meeting minutes covering a variety of administrative and political topics at the municipal level.
**Try out the model**: [Hugging Face Space Demo](https://huggingface.co/spaces/anonymous12321/Citilink-Summ-PT)
### Key Features
- 🧾 **Abstractive Summarization** – Generates natural, human-like summaries rather than extracts.
- 🇵🇹 **European Portuguese** – Optimized for official and administrative Portuguese.
- 🏛️ **Domain-Specific** – Trained on municipal meeting minutes and administrative discussions.
- ⚙️ **Fine-tuned BART** – Built upon `facebook/bart-base` using supervised fine-tuning.
- 🧠 **Fact-Aware Generation** – Produces short summaries that preserve factual content.
---
## Model Details
- **Architecture:** `facebook/bart-base`
- **Task:** Abstractive summarization (`text → summary`)
- **Framework:** 🤗 Transformers (PyTorch)
- **Tokenizer:** BART-base tokenizer (English vocabulary adapted for Portuguese text)
- **Max Input Length:** 1024 tokens
- **Max Summary Length:** 128 tokens
- **Training Objective:** Conditional generation (cross-entropy loss)
- **Dataset:** Portuguese municipal meeting minutes annotated with summaries
---
## How It Works
The model receives a discussion subject of a municipal meeting and outputs a short, coherent summary highlighting:
- The **main subject or topic** of discussion
- Any **decisions, motions, or proposals** made
- The **entities or departments** involved
### Example Usage
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = "anonymous12321/Bart-Base-Summarization-Council-PT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
text = """
17. PROCESSO DE OBRAS N.º ***** -- EDIFIC\nPelo Senhor Presidente foi presente a esta reunião a informação n.º ****** da Secção de Urbanismo e Fiscalização -- Serviço de Obras Particulares que se anexa à presente ata. \nPonderado e analisado o assunto o Executivo Municipal deliberou por unanimidade aprovar as especialidades relativas ao processo de obras n.º ***** -- EDIFIC.
"""
inputs = tokenizer(text, return_tensors="pt", max_length=1024, truncation=True)
summary_ids = model.generate(**inputs, max_length=128, num_beams=4, early_stopping=True)
print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
```
# 🧾 Model Output
**Output:**
> "O Executivo Municipal aprovou, por unanimidade, as especialidades relativas a um processo de obras particulares."
---
## 📊 Evaluation Results
### Quantitative Metrics (on held-out test set)
| Metric | Score | Description |
|:-------|:------:|:------------|
| **ROUGE-1** | 0.556 | Unigram overlap between generated and reference summaries |
| **ROUGE-2** | 0.432 | Bigram overlap |
| **ROUGE-L** | 0.503 | Longest common subsequence overlap |
| **BERTScore (F1)** | 0.807 | Semantic similarity between summary and reference |
---
## ⚙️ Training Details
- **Pretrained Model:** `facebook/bart-base`
- **Optimizer:** AdamW (default in Hugging Face Trainer)
- **Learning Rate:** 2e-5
- **Batch Size:** 4
- **Epochs:** 3
- **Scheduler:** Linear warmup
- **Loss Function:** Cross-entropy
- **Evaluation Metrics:** ROUGE (computed on validation set every 100 steps)
- **Evaluation Strategy:** Step-based evaluation (`eval_steps=100`)
- **Weight Decay:** 0.01
- **Mixed Precision (fp16):** Enabled when CUDA is available
---
## 📚 Dataset Description
The model was trained on a specialized dataset of **Portuguese municipal meeting minutes**, consisting of:
- Discussion Subjects from official municipal meeting minutes.
- Decisions and deliberations across departments (urban planning, finance, education, etc.)
- Expert-annotated summaries per discussion segment
**Dataset sources include:**
- Six Portuguese municipalities meeting minutes
---
## ⚠️ Limitations
- **Language Restriction:** The model is optimized for Portuguese; performance may degrade in other languages.
- **Domain Dependence:** Best suited for administrative and institutional texts; less effective on informal or creative writing.
- **Length Sensitivity:** Very long transcripts (>1024 tokens) are truncated; chunking may be needed for full documents.
- **Generalization:** While robust within-domain, it may underperform on unseen domains or vocabulary.
---
## 📄 License
This model is released under the
**Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0).**
---
|