anonymous12321 commited on
Commit
f60cf4d
ยท
verified ยท
1 Parent(s): c5c292b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -14
README.md CHANGED
@@ -10,7 +10,7 @@ tags:
10
  - multilabel-classification
11
  - portuguese
12
  - administrative-documents
13
- - intelligent-stacking
14
  - ensemble-learning
15
  - bert
16
  - tfidf
@@ -19,36 +19,34 @@ base_model:
19
  - neuralmind/bert-base-portuguese-cased
20
  ---
21
 
22
- # Intelligent Stacking: Multilabel Portuguese Administrative Document Classifier
23
 
24
  ## Model Description
25
 
26
- **Intelligent Stacking** is an advanced ensemble learning system specialized in multilabel classification of Portuguese administrative documents. The model combines 12 base models with intelligent meta-learning to achieve high performance on municipal document categorization tasks.
27
 
28
  **Try out the model**: [Hugging Face Space Demo](https://huggingface.co/spaces/anonymous12321/PT-AdminDocs-Classifier)
29
 
30
  ### Key Features
31
 
32
- - ๐Ÿง  **Intelligent Meta-Learning**: Advanced ensemble combination using stacked generalization
33
  - ๐Ÿ“š **12 Base Models**: 3 feature sets ร— 4 algorithms for robust predictions
34
- - ๐Ÿ‡ต๐Ÿ‡น **Portuguese Optimized**: Fine-tuned for Portuguese administrative language
35
- - โšก **High Performance**: F1-macro score of 0.5486
36
  - ๐Ÿข **22 Categories**: Comprehensive municipal administrative document classification
37
  - ๐ŸŽฏ **Dynamic Thresholds**: Optimized per-category decision boundaries
38
 
39
  ## Model Details
40
 
41
- - **Architecture**: Intelligent Stacking with Meta-Learning
42
  - **Base Models**: 12 diverse classifiers (LogReg, Random Forest, Gradient Boosting)
43
  - **Feature Engineering**: TF-IDF + BERTimbau embeddings + Statistical features
44
- - **Meta-Learner**: Advanced ensemble combination algorithm
45
- - **Categories**: 22 Portuguese administrative document types
46
  - **Training Method**: Cross-validation stacking with dynamic threshold optimization
47
- - **Framework**: Scikit-learn + Transformers
48
 
49
  ## How It Works
50
 
51
- The Intelligent Stacking system operates in multiple stages:
52
 
53
  1. **Feature Extraction**: Three complementary feature sets
54
  - TF-IDF vectorization (word and character n-grams)
@@ -60,7 +58,7 @@ The Intelligent Stacking system operates in multiple stages:
60
  - Random Forest
61
  - Gradient Boosting
62
 
63
- 3. **Meta-Learning**: Intelligent combination of base model predictions using advanced stacking
64
 
65
  4. **Dynamic Thresholds**: Per-category optimized decision boundaries for multilabel output
66
 
@@ -125,7 +123,7 @@ print("Predicted categories:", predicted_labels)
125
 
126
  ## Categories
127
 
128
- The model classifies documents into 22 Portuguese administrative categories:
129
 
130
  | Category | Portuguese Name |
131
  |----------|-----------------|
@@ -197,7 +195,7 @@ The model was trained on a curated dataset of Portuguese municipal council meeti
197
  ## Limitations
198
 
199
  - **Language Specificity**: Optimized for Portuguese administrative language
200
- - **Domain Focus**: Best performance on governmental/municipal documents
201
  - **Computational Requirements**: Requires significant memory for all model components
202
  - **Threshold Sensitivity**: Performance depends on carefully tuned per-category thresholds
203
  - **Class Imbalance**: Some categories may have lower precision due to limited training examples
 
10
  - multilabel-classification
11
  - portuguese
12
  - administrative-documents
13
+ - stacking
14
  - ensemble-learning
15
  - bert
16
  - tfidf
 
19
  - neuralmind/bert-base-portuguese-cased
20
  ---
21
 
22
+ # CouncilTopics-PT: A multi-label classifier for Portuguese municipal meeting topics.
23
 
24
  ## Model Description
25
 
26
+ **CouncilTopics-PT is an ensemble learning system specialized in multilabel classification of Portuguese Municipal topics from Meeting Minutes. The model combines 12 base models with meta-learning to achieve usable performance on municipal topics categorization tasks.
27
 
28
  **Try out the model**: [Hugging Face Space Demo](https://huggingface.co/spaces/anonymous12321/PT-AdminDocs-Classifier)
29
 
30
  ### Key Features
31
 
32
+ - ๐Ÿง  **Meta-Learning**: Ensemble combination using stacked generalization
33
  - ๐Ÿ“š **12 Base Models**: 3 feature sets ร— 4 algorithms for robust predictions
34
+ - ๐Ÿ‡ต๐Ÿ‡น **Portuguese Optimized**: Prepared for the Portuguese language
 
35
  - ๐Ÿข **22 Categories**: Comprehensive municipal administrative document classification
36
  - ๐ŸŽฏ **Dynamic Thresholds**: Optimized per-category decision boundaries
37
 
38
  ## Model Details
39
 
40
+ - **Architecture**: Stacking with Meta-Learning
41
  - **Base Models**: 12 diverse classifiers (LogReg, Random Forest, Gradient Boosting)
42
  - **Feature Engineering**: TF-IDF + BERTimbau embeddings + Statistical features
43
+ - **Meta-Learner**: Ensemble combination algorithm
44
+ - **Categories**: 22 Portuguese administrative topic labels
45
  - **Training Method**: Cross-validation stacking with dynamic threshold optimization
 
46
 
47
  ## How It Works
48
 
49
+ The Council topics system operates in multiple stages:
50
 
51
  1. **Feature Extraction**: Three complementary feature sets
52
  - TF-IDF vectorization (word and character n-grams)
 
58
  - Random Forest
59
  - Gradient Boosting
60
 
61
+ 3. **Meta-Learning**: Combination of base model predictions using stacking
62
 
63
  4. **Dynamic Thresholds**: Per-category optimized decision boundaries for multilabel output
64
 
 
123
 
124
  ## Categories
125
 
126
+ The model classifies topics into 22 Portuguese administrative categories:
127
 
128
  | Category | Portuguese Name |
129
  |----------|-----------------|
 
195
  ## Limitations
196
 
197
  - **Language Specificity**: Optimized for Portuguese administrative language
198
+ - **Domain Focus**: Best performance on municipal documents
199
  - **Computational Requirements**: Requires significant memory for all model components
200
  - **Threshold Sensitivity**: Performance depends on carefully tuned per-category thresholds
201
  - **Class Imbalance**: Some categories may have lower precision due to limited training examples