anonymous12321
/

Council_Topics_Classifier_PT

@@ -23,16 +23,16 @@ base_model:
 ## Model Description
-**Municipal Topics Classifier** is an ensemble machine learning system specialized in **multi-label topic classification** for Portuguese municipal council meeting minutes. The model combines Gradient Boosting with Active Learning and BERTimbau embeddings to identify multiple simultaneous topics within administrative texts, making it particularly effective for categorizing complex governmental content.
-🚀 **Try out the model:** [Hugging Face Space Demo](#)
 ## Key Features
-- 🎯 **Specialized for Municipal Topics**: Trained on Portuguese council meeting minutes with domain-specific preprocessing
 - 🏆 **Advanced Ensemble**: Combines LogisticRegression + 3x GradientBoosting models with adaptive weighting
 - 🧠 **Deep + Classical Features**: Merges TF-IDF vectors (10k features) with BERTimbau embeddings (768 dims)
-- 📊 **Multi-Label Classification**: Identifies multiple co-occurring topics per text
 - ⚡ **Optimized Thresholds**: Dynamic per-label thresholds tuned on validation data
 - 🔄 **Active Learning Ready**: Adaptive weighting based on label frequency for continuous improvement
@@ -94,22 +94,6 @@ Ambiente (Confidence: 54%)
 ## Usage
-### Quick Start with Streamlit Demo
-```bash
-# Clone the repository
-git clone https://huggingface.co/spaces/YOUR_USERNAME/municipal-topics-classifier
-cd municipal-topics-classifier
-# Install dependencies
-pip install -r requirements.txt
-# Run the Streamlit app
-streamlit run app.py
-```
-### Programmatic Usage
 ```python
 import numpy as np
 from joblib import load
@@ -164,49 +148,18 @@ print(f"Predicted Topics: {predicted_labels}")
 | **Subset Accuracy** | 0.45 |
 | **Average Precision** | 0.79 |
-### Per-Label Performance (Top Categories)
-| Label | Precision | Recall | F1-Score | Support |
-|-------|-----------|--------|----------|---------|
-| Orçamento e Finanças | 0.88 | 0.85 | 0.86 | 145 |
-| Obras Públicas | 0.84 | 0.81 | 0.82 | 132 |
-| Recursos Humanos | 0.79 | 0.76 | 0.77 | 98 |
-| Educação | 0.82 | 0.78 | 0.80 | 87 |
-| Ambiente | 0.75 | 0.72 | 0.73 | 76 |
-### Ensemble Performance vs. Individual Models
-| Model | Micro F1 | Macro F1 |
-|-------|----------|----------|
-| LogisticRegression | 0.76 | 0.68 |
-| GradientBoosting #1 | 0.78 | 0.70 |
-| GradientBoosting #2 | 0.79 | 0.71 |
-| GradientBoosting #3 | 0.80 | 0.72 |
-| **Adaptive Ensemble** | **0.82** | **0.74** |
 ## Dataset
 The model was trained on a curated dataset of Portuguese municipal council meeting minutes:
-- **Documents**: 2,500+ meeting minutes
-- **Time Period**: 2018-2024
 - **Source**: Portuguese municipalities (anonymized)
-- **Labels**: 25 topic categories
-- **Annotation**: Multi-label (avg. 2.3 labels per document)
 - **Split**: 60% train / 20% validation / 20% test
-### Label Distribution
-Common topics include:
-- Orçamento e Finanças (Budget & Finance)
-- Obras Públicas (Public Works)
-- Recursos Humanos (Human Resources)
-- Educação (Education)
-- Ambiente (Environment)
-- Saúde (Health)
-- Transportes (Transportation)
-- Urbanismo (Urban Planning)
 ## Training Details
 ### Preprocessing
@@ -253,45 +206,15 @@ Common topics include:
 ## Limitations
-- **Language Specificity**: Optimized for Portuguese; other languages not supported
 - **Domain Focus**: Best performance on municipal/administrative texts
-- **Label Set**: Fixed to 25 predefined categories (not extensible without retraining)
-- **Context Length**: BERTimbau limited to 512 tokens (long documents are truncated)
 - **Rare Topics**: Lower performance on infrequent labels (<20 training examples)
 - **Ambiguous Cases**: May over-predict for texts with multiple overlapping themes
-## Model Card Contact
-For questions, feedback, or collaboration:
-- 📧 Email: [your-email@example.com]
-- 🐛 Issues: [GitHub Issues](#)
-- 💬 Discussions: [Hugging Face Discussions](#)
-## Citation
-If you use this model in your research, please cite:
-```bibtex
-@misc{municipal-topics-classifier,
-  author = {Your Name},
-  title = {Municipal Topics Classifier: Multi-Label Topic Classification for Portuguese Council Texts},
-  year = {2024},
-  publisher = {Hugging Face},
-  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/municipal-topics-classifier}}
-}
-```
 ## License
 This model is released under the **Attribution-NonCommercial-NoDerivatives 4.0 International** (CC BY-NC-ND 4.0).
-- ✅ **Allowed**: Non-commercial use, redistribution with attribution
-- ❌ **Not Allowed**: Commercial use, modifications, derivative works
-## Acknowledgments
-- **BERTimbau**: neuralmind/bert-base-portuguese-cased
-- **Framework**: Hugging Face Transformers, Scikit-learn
-- **Dataset**: Portuguese municipalities (anonymized)
 ---

 ## Model Description
+**Municipal Topics Classifier** is an ensemble machine learning system specialized in **multi-label topic classification** for Portuguese municipal council meeting minutes. The model combines Gradient Boosting with Active Learning and BERTimbau embeddings to identify multiple simultaneous topics within municipal discussion subbjects, making it particularly effective for categorizing complex governmental content.
+🚀 **Try out the model:** [Hugging Face Space Demo](https://huggingface.co/spaces/anonymous12321/GB_CouncilTopics-PT)
 ## Key Features
+- 🎯 **Specialized for Municipal Topics**: Trained on Portuguese council meeting minutes discussion subjects with domain-specific preprocessing
 - 🏆 **Advanced Ensemble**: Combines LogisticRegression + 3x GradientBoosting models with adaptive weighting
 - 🧠 **Deep + Classical Features**: Merges TF-IDF vectors (10k features) with BERTimbau embeddings (768 dims)
+- 📊 **Multi-Label Classification**: Identifies multiple co-occurring topics per subject
 - ⚡ **Optimized Thresholds**: Dynamic per-label thresholds tuned on validation data
 - 🔄 **Active Learning Ready**: Adaptive weighting based on label frequency for continuous improvement
 ## Usage
 ```python
 import numpy as np
 from joblib import load
 | **Subset Accuracy** | 0.45 |
 | **Average Precision** | 0.79 |
 ## Dataset
 The model was trained on a curated dataset of Portuguese municipal council meeting minutes:
+- **Documents**: 2,500+ meeting minutes subjects
+- **Time Period**: 2021-2024
 - **Source**: Portuguese municipalities (anonymized)
+- **Labels**: 22 topic categories
+- **Annotation**: Multi-label (avg. 1.69 labels per document)
 - **Split**: 60% train / 20% validation / 20% test
 ## Training Details
 ### Preprocessing
 ## Limitations
+- **Language Specificity**: Optimized for Portuguese
 - **Domain Focus**: Best performance on municipal/administrative texts
+- **Label Set**: Fixed to 22 predefined categories
 - **Rare Topics**: Lower performance on infrequent labels (<20 training examples)
 - **Ambiguous Cases**: May over-predict for texts with multiple overlapping themes
 ## License
 This model is released under the **Attribution-NonCommercial-NoDerivatives 4.0 International** (CC BY-NC-ND 4.0).
 ---