anonymous12321
/

Primera-Summarization-Council-PT

@@ -22,7 +22,7 @@ base_model:
 **Primera-Summarization-Council-PT** is an **abstractive text summarization model** based on **primera**, fine-tuned to produce concise and informative summaries of discussion subjects from **Portuguese municipal meeting minutes**.
 The model was trained on a curated and annotated corpus of official municipal meeting minutes covering a variety of administrative and political topics at the municipal level.
-**Try out the model**: [Hugging Face Space Demo](https://huggingface.co/spaces/anonymous12321/CitilinkSumm-PT)
 ### Key Features
@@ -59,7 +59,7 @@ The model receives a discussion subject of a municipal meeting and outputs a sho
 ```python
 from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-model_name = "anonymous12321/CitilinkSumm-PT"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
@@ -86,16 +86,16 @@ print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
 | Metric | Score | Description |
 |:-------|:------:|:------------|
-| **ROUGE-1** | ... | Unigram overlap between generated and reference summaries |
-| **ROUGE-2** | ... | Bigram overlap |
-| **ROUGE-L** | ... | Longest common subsequence overlap |
-| **BERTScore (F1)** | ... | Semantic similarity between summary and reference |
 ---
 ## ⚙️ Training Details
-- **Pretrained Model:** `facebook/bart-base`
 - **Optimizer:** AdamW (default in Hugging Face Trainer)
 - **Learning Rate:** 2e-5
 - **Batch Size:** 4
@@ -106,6 +106,8 @@ print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
 - **Evaluation Strategy:** Step-based evaluation (`eval_steps=100`)
 - **Weight Decay:** 0.01
 - **Mixed Precision (fp16):** Enabled when CUDA is available
 ---
@@ -132,15 +134,6 @@ The model was trained on a specialized dataset of **Portuguese municipal meeting
 ---
-## ⚖️ Ethical Considerations
-The model is intended for **research and administrative document processing**.
-- Outputs should **not** be used for legal decision-making without human verification.
-- Potential bias may exist due to limited geographic and institutional diversity in training data.
----
 ## 📄 License
 This model is released under the

 **Primera-Summarization-Council-PT** is an **abstractive text summarization model** based on **primera**, fine-tuned to produce concise and informative summaries of discussion subjects from **Portuguese municipal meeting minutes**.
 The model was trained on a curated and annotated corpus of official municipal meeting minutes covering a variety of administrative and political topics at the municipal level.
+**Try out the model**: [Hugging Face Space Demo](https://huggingface.co/spaces/anonymous12321/Citilink-Summ-PT)
 ### Key Features
 ```python
 from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+model_name = "anonymous12321/Primera-Summarization-Council-PT"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
 | Metric | Score | Description |
 |:-------|:------:|:------------|
+| **ROUGE-1** | 0.632 | Unigram overlap between generated and reference summaries |
+| **ROUGE-2** | 0.500 | Bigram overlap |
+| **ROUGE-L** | 0.577 | Longest common subsequence overlap |
+| **BERTScore (F1)** | 0.846 | Semantic similarity between summary and reference |
 ---
 ## ⚙️ Training Details
+- **Pretrained Model:** `allenai/primera`
 - **Optimizer:** AdamW (default in Hugging Face Trainer)
 - **Learning Rate:** 2e-5
 - **Batch Size:** 4
 - **Evaluation Strategy:** Step-based evaluation (`eval_steps=100`)
 - **Weight Decay:** 0.01
 - **Mixed Precision (fp16):** Enabled when CUDA is available
+- **Chunking:** Implemented with `max_length=512` and `stride=256` for hierarchical input segmentation
+- **Target (summary) Max Length:** 128 tokens
 ---
 ---
 ## 📄 License
 This model is released under the