anonymous12321 commited on
Commit
81eb846
·
verified ·
1 Parent(s): a4480db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -16
README.md CHANGED
@@ -22,7 +22,7 @@ base_model:
22
  **Primera-Summarization-Council-PT** is an **abstractive text summarization model** based on **primera**, fine-tuned to produce concise and informative summaries of discussion subjects from **Portuguese municipal meeting minutes**.
23
  The model was trained on a curated and annotated corpus of official municipal meeting minutes covering a variety of administrative and political topics at the municipal level.
24
 
25
- **Try out the model**: [Hugging Face Space Demo](https://huggingface.co/spaces/anonymous12321/CitilinkSumm-PT)
26
 
27
  ### Key Features
28
 
@@ -59,7 +59,7 @@ The model receives a discussion subject of a municipal meeting and outputs a sho
59
  ```python
60
  from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
61
 
62
- model_name = "anonymous12321/CitilinkSumm-PT"
63
  tokenizer = AutoTokenizer.from_pretrained(model_name)
64
  model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
65
 
@@ -86,16 +86,16 @@ print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
86
 
87
  | Metric | Score | Description |
88
  |:-------|:------:|:------------|
89
- | **ROUGE-1** | ... | Unigram overlap between generated and reference summaries |
90
- | **ROUGE-2** | ... | Bigram overlap |
91
- | **ROUGE-L** | ... | Longest common subsequence overlap |
92
- | **BERTScore (F1)** | ... | Semantic similarity between summary and reference |
93
 
94
  ---
95
 
96
  ## ⚙️ Training Details
97
 
98
- - **Pretrained Model:** `facebook/bart-base`
99
  - **Optimizer:** AdamW (default in Hugging Face Trainer)
100
  - **Learning Rate:** 2e-5
101
  - **Batch Size:** 4
@@ -106,6 +106,8 @@ print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
106
  - **Evaluation Strategy:** Step-based evaluation (`eval_steps=100`)
107
  - **Weight Decay:** 0.01
108
  - **Mixed Precision (fp16):** Enabled when CUDA is available
 
 
109
 
110
  ---
111
 
@@ -132,15 +134,6 @@ The model was trained on a specialized dataset of **Portuguese municipal meeting
132
 
133
  ---
134
 
135
- ## ⚖️ Ethical Considerations
136
-
137
- The model is intended for **research and administrative document processing**.
138
-
139
- - Outputs should **not** be used for legal decision-making without human verification.
140
- - Potential bias may exist due to limited geographic and institutional diversity in training data.
141
-
142
- ---
143
-
144
  ## 📄 License
145
 
146
  This model is released under the
 
22
  **Primera-Summarization-Council-PT** is an **abstractive text summarization model** based on **primera**, fine-tuned to produce concise and informative summaries of discussion subjects from **Portuguese municipal meeting minutes**.
23
  The model was trained on a curated and annotated corpus of official municipal meeting minutes covering a variety of administrative and political topics at the municipal level.
24
 
25
+ **Try out the model**: [Hugging Face Space Demo](https://huggingface.co/spaces/anonymous12321/Citilink-Summ-PT)
26
 
27
  ### Key Features
28
 
 
59
  ```python
60
  from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
61
 
62
+ model_name = "anonymous12321/Primera-Summarization-Council-PT"
63
  tokenizer = AutoTokenizer.from_pretrained(model_name)
64
  model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
65
 
 
86
 
87
  | Metric | Score | Description |
88
  |:-------|:------:|:------------|
89
+ | **ROUGE-1** | 0.632 | Unigram overlap between generated and reference summaries |
90
+ | **ROUGE-2** | 0.500 | Bigram overlap |
91
+ | **ROUGE-L** | 0.577 | Longest common subsequence overlap |
92
+ | **BERTScore (F1)** | 0.846 | Semantic similarity between summary and reference |
93
 
94
  ---
95
 
96
  ## ⚙️ Training Details
97
 
98
+ - **Pretrained Model:** `allenai/primera`
99
  - **Optimizer:** AdamW (default in Hugging Face Trainer)
100
  - **Learning Rate:** 2e-5
101
  - **Batch Size:** 4
 
106
  - **Evaluation Strategy:** Step-based evaluation (`eval_steps=100`)
107
  - **Weight Decay:** 0.01
108
  - **Mixed Precision (fp16):** Enabled when CUDA is available
109
+ - **Chunking:** Implemented with `max_length=512` and `stride=256` for hierarchical input segmentation
110
+ - **Target (summary) Max Length:** 128 tokens
111
 
112
  ---
113
 
 
134
 
135
  ---
136
 
 
 
 
 
 
 
 
 
 
137
  ## 📄 License
138
 
139
  This model is released under the