Update README.md
Browse files
README.md
CHANGED
|
@@ -10,7 +10,7 @@ tags:
|
|
| 10 |
- multilabel-classification
|
| 11 |
- portuguese
|
| 12 |
- administrative-documents
|
| 13 |
-
-
|
| 14 |
- ensemble-learning
|
| 15 |
- bert
|
| 16 |
- tfidf
|
|
@@ -19,36 +19,34 @@ base_model:
|
|
| 19 |
- neuralmind/bert-base-portuguese-cased
|
| 20 |
---
|
| 21 |
|
| 22 |
-
#
|
| 23 |
|
| 24 |
## Model Description
|
| 25 |
|
| 26 |
-
**
|
| 27 |
|
| 28 |
**Try out the model**: [Hugging Face Space Demo](https://huggingface.co/spaces/anonymous12321/PT-AdminDocs-Classifier)
|
| 29 |
|
| 30 |
### Key Features
|
| 31 |
|
| 32 |
-
- ๐ง **
|
| 33 |
- ๐ **12 Base Models**: 3 feature sets ร 4 algorithms for robust predictions
|
| 34 |
-
- ๐ต๐น **Portuguese Optimized**:
|
| 35 |
-
- โก **High Performance**: F1-macro score of 0.5486
|
| 36 |
- ๐ข **22 Categories**: Comprehensive municipal administrative document classification
|
| 37 |
- ๐ฏ **Dynamic Thresholds**: Optimized per-category decision boundaries
|
| 38 |
|
| 39 |
## Model Details
|
| 40 |
|
| 41 |
-
- **Architecture**:
|
| 42 |
- **Base Models**: 12 diverse classifiers (LogReg, Random Forest, Gradient Boosting)
|
| 43 |
- **Feature Engineering**: TF-IDF + BERTimbau embeddings + Statistical features
|
| 44 |
-
- **Meta-Learner**:
|
| 45 |
-
- **Categories**: 22 Portuguese administrative
|
| 46 |
- **Training Method**: Cross-validation stacking with dynamic threshold optimization
|
| 47 |
-
- **Framework**: Scikit-learn + Transformers
|
| 48 |
|
| 49 |
## How It Works
|
| 50 |
|
| 51 |
-
The
|
| 52 |
|
| 53 |
1. **Feature Extraction**: Three complementary feature sets
|
| 54 |
- TF-IDF vectorization (word and character n-grams)
|
|
@@ -60,7 +58,7 @@ The Intelligent Stacking system operates in multiple stages:
|
|
| 60 |
- Random Forest
|
| 61 |
- Gradient Boosting
|
| 62 |
|
| 63 |
-
3. **Meta-Learning**:
|
| 64 |
|
| 65 |
4. **Dynamic Thresholds**: Per-category optimized decision boundaries for multilabel output
|
| 66 |
|
|
@@ -125,7 +123,7 @@ print("Predicted categories:", predicted_labels)
|
|
| 125 |
|
| 126 |
## Categories
|
| 127 |
|
| 128 |
-
The model classifies
|
| 129 |
|
| 130 |
| Category | Portuguese Name |
|
| 131 |
|----------|-----------------|
|
|
@@ -197,7 +195,7 @@ The model was trained on a curated dataset of Portuguese municipal council meeti
|
|
| 197 |
## Limitations
|
| 198 |
|
| 199 |
- **Language Specificity**: Optimized for Portuguese administrative language
|
| 200 |
-
- **Domain Focus**: Best performance on
|
| 201 |
- **Computational Requirements**: Requires significant memory for all model components
|
| 202 |
- **Threshold Sensitivity**: Performance depends on carefully tuned per-category thresholds
|
| 203 |
- **Class Imbalance**: Some categories may have lower precision due to limited training examples
|
|
|
|
| 10 |
- multilabel-classification
|
| 11 |
- portuguese
|
| 12 |
- administrative-documents
|
| 13 |
+
- stacking
|
| 14 |
- ensemble-learning
|
| 15 |
- bert
|
| 16 |
- tfidf
|
|
|
|
| 19 |
- neuralmind/bert-base-portuguese-cased
|
| 20 |
---
|
| 21 |
|
| 22 |
+
# CouncilTopics-PT: A multi-label classifier for Portuguese municipal meeting topics.
|
| 23 |
|
| 24 |
## Model Description
|
| 25 |
|
| 26 |
+
**CouncilTopics-PT is an ensemble learning system specialized in multilabel classification of Portuguese Municipal topics from Meeting Minutes. The model combines 12 base models with meta-learning to achieve usable performance on municipal topics categorization tasks.
|
| 27 |
|
| 28 |
**Try out the model**: [Hugging Face Space Demo](https://huggingface.co/spaces/anonymous12321/PT-AdminDocs-Classifier)
|
| 29 |
|
| 30 |
### Key Features
|
| 31 |
|
| 32 |
+
- ๐ง **Meta-Learning**: Ensemble combination using stacked generalization
|
| 33 |
- ๐ **12 Base Models**: 3 feature sets ร 4 algorithms for robust predictions
|
| 34 |
+
- ๐ต๐น **Portuguese Optimized**: Prepared for the Portuguese language
|
|
|
|
| 35 |
- ๐ข **22 Categories**: Comprehensive municipal administrative document classification
|
| 36 |
- ๐ฏ **Dynamic Thresholds**: Optimized per-category decision boundaries
|
| 37 |
|
| 38 |
## Model Details
|
| 39 |
|
| 40 |
+
- **Architecture**: Stacking with Meta-Learning
|
| 41 |
- **Base Models**: 12 diverse classifiers (LogReg, Random Forest, Gradient Boosting)
|
| 42 |
- **Feature Engineering**: TF-IDF + BERTimbau embeddings + Statistical features
|
| 43 |
+
- **Meta-Learner**: Ensemble combination algorithm
|
| 44 |
+
- **Categories**: 22 Portuguese administrative topic labels
|
| 45 |
- **Training Method**: Cross-validation stacking with dynamic threshold optimization
|
|
|
|
| 46 |
|
| 47 |
## How It Works
|
| 48 |
|
| 49 |
+
The Council topics system operates in multiple stages:
|
| 50 |
|
| 51 |
1. **Feature Extraction**: Three complementary feature sets
|
| 52 |
- TF-IDF vectorization (word and character n-grams)
|
|
|
|
| 58 |
- Random Forest
|
| 59 |
- Gradient Boosting
|
| 60 |
|
| 61 |
+
3. **Meta-Learning**: Combination of base model predictions using stacking
|
| 62 |
|
| 63 |
4. **Dynamic Thresholds**: Per-category optimized decision boundaries for multilabel output
|
| 64 |
|
|
|
|
| 123 |
|
| 124 |
## Categories
|
| 125 |
|
| 126 |
+
The model classifies topics into 22 Portuguese administrative categories:
|
| 127 |
|
| 128 |
| Category | Portuguese Name |
|
| 129 |
|----------|-----------------|
|
|
|
|
| 195 |
## Limitations
|
| 196 |
|
| 197 |
- **Language Specificity**: Optimized for Portuguese administrative language
|
| 198 |
+
- **Domain Focus**: Best performance on municipal documents
|
| 199 |
- **Computational Requirements**: Requires significant memory for all model components
|
| 200 |
- **Threshold Sensitivity**: Performance depends on carefully tuned per-category thresholds
|
| 201 |
- **Class Imbalance**: Some categories may have lower precision due to limited training examples
|