Update README.md
Browse files
README.md
CHANGED
|
@@ -6,7 +6,6 @@ colorTo: blue
|
|
| 6 |
sdk: docker
|
| 7 |
app_port: 8501
|
| 8 |
tags:
|
| 9 |
-
- streamlit
|
| 10 |
- text-classification
|
| 11 |
- multilabel-classification
|
| 12 |
- portuguese
|
|
@@ -24,7 +23,7 @@ base_model:
|
|
| 24 |
|
| 25 |
## Model Description
|
| 26 |
|
| 27 |
-
**Intelligent Stacking** is an advanced ensemble learning system specialized in multilabel classification of Portuguese administrative documents. The model combines 12 base models with intelligent meta-learning to achieve
|
| 28 |
|
| 29 |
**Try out the model**: [Hugging Face Space Demo](https://huggingface.co/spaces/YOUR_USERNAME/intelligent-stacking-demo)
|
| 30 |
|
|
@@ -33,7 +32,7 @@ base_model:
|
|
| 33 |
- 🧠 **Intelligent Meta-Learning**: Advanced ensemble combination using stacked generalization
|
| 34 |
- 📚 **12 Base Models**: 3 feature sets × 4 algorithms for robust predictions
|
| 35 |
- 🇵🇹 **Portuguese Optimized**: Fine-tuned for Portuguese administrative language
|
| 36 |
-
- ⚡ **High Performance**: F1-macro score of 0.5486
|
| 37 |
- 🏢 **22 Categories**: Comprehensive municipal administrative document classification
|
| 38 |
- 🎯 **Dynamic Thresholds**: Optimized per-category decision boundaries
|
| 39 |
|
|
@@ -65,71 +64,6 @@ The Intelligent Stacking system operates in multiple stages:
|
|
| 65 |
|
| 66 |
4. **Dynamic Thresholds**: Per-category optimized decision boundaries for multilabel output
|
| 67 |
|
| 68 |
-
## Usage
|
| 69 |
-
|
| 70 |
-
### Quick Start with Python
|
| 71 |
-
|
| 72 |
-
```python
|
| 73 |
-
import joblib
|
| 74 |
-
import numpy as np
|
| 75 |
-
from sklearn.feature_extraction.text import TfidfVectorizer
|
| 76 |
-
from scipy.sparse import hstack, csr_matrix
|
| 77 |
-
|
| 78 |
-
# Load the model components
|
| 79 |
-
tfidf_vectorizer = joblib.load("int_stacking_tfidf_vectorizer.joblib")
|
| 80 |
-
meta_learner = joblib.load("int_stacking_meta_learner.joblib")
|
| 81 |
-
mlb_encoder = joblib.load("int_stacking_mlb_encoder.joblib")
|
| 82 |
-
base_models = joblib.load("int_stacking_base_models.joblib")
|
| 83 |
-
optimal_thresholds = np.load("int_stacking_optimal_thresholds.npy")
|
| 84 |
-
|
| 85 |
-
# Prepare text
|
| 86 |
-
text = """CONTRATO DE PRESTAÇÃO DE SERVIÇOS
|
| 87 |
-
Entre a Administração Pública Municipal e a empresa contratada,
|
| 88 |
-
fica estabelecido o presente contrato para prestação de serviços
|
| 89 |
-
de manutenção e conservação de vias públicas."""
|
| 90 |
-
|
| 91 |
-
# Extract features
|
| 92 |
-
tfidf_features = tfidf_vectorizer.transform([text])
|
| 93 |
-
|
| 94 |
-
# Generate base model predictions
|
| 95 |
-
base_predictions = np.zeros((1, len(mlb_encoder.classes_), 12))
|
| 96 |
-
model_idx = 0
|
| 97 |
-
|
| 98 |
-
for feat_name in ["TF-IDF", "BERT", "TF-IDF+BERT"]:
|
| 99 |
-
for algo_name in ["LogReg_C1", "LogReg_C05", "GradBoost", "RandomForest"]:
|
| 100 |
-
model_key = f"{feat_name}_{algo_name}"
|
| 101 |
-
if model_key in base_models:
|
| 102 |
-
model = base_models[model_key]
|
| 103 |
-
pred = model.predict_proba(tfidf_features)
|
| 104 |
-
base_predictions[0, :, model_idx] = pred[0]
|
| 105 |
-
model_idx += 1
|
| 106 |
-
|
| 107 |
-
# Meta-learner prediction
|
| 108 |
-
meta_features = base_predictions.reshape(1, -1)
|
| 109 |
-
meta_pred = meta_learner.predict_proba(meta_features)[0]
|
| 110 |
-
|
| 111 |
-
# Apply dynamic thresholds
|
| 112 |
-
predicted_labels = []
|
| 113 |
-
for i, (prob, threshold) in enumerate(zip(meta_pred, optimal_thresholds)):
|
| 114 |
-
if prob > threshold:
|
| 115 |
-
predicted_labels.append({
|
| 116 |
-
"label": mlb_encoder.classes_[i],
|
| 117 |
-
"probability": float(prob),
|
| 118 |
-
"confidence": "high" if prob > 0.7 else "medium" if prob > 0.4 else "low"
|
| 119 |
-
})
|
| 120 |
-
|
| 121 |
-
# Sort by probability
|
| 122 |
-
predicted_labels.sort(key=lambda x: x["probability"], reverse=True)
|
| 123 |
-
print("Predicted categories:", predicted_labels)
|
| 124 |
-
```
|
| 125 |
-
|
| 126 |
-
### Streamlit Demo
|
| 127 |
-
|
| 128 |
-
The model includes a complete Streamlit web interface for easy testing:
|
| 129 |
-
|
| 130 |
-
```bash
|
| 131 |
-
streamlit run app.py
|
| 132 |
-
```
|
| 133 |
|
| 134 |
## Categories
|
| 135 |
|
|
@@ -173,7 +107,6 @@ The model classifies documents into 22 Portuguese administrative categories:
|
|
| 173 |
| **Hamming Loss** | **0.0426** | Label-wise error rate |
|
| 174 |
| **Average Precision (macro)** | **0.608** | Macro-averaged AP |
|
| 175 |
| **Average Precision (micro)** | **0.785** | Micro-averaged AP |
|
| 176 |
-
| **Improvement** | **+54.7%** | Over Decision Tree baseline |
|
| 177 |
|
| 178 |
|
| 179 |
## Technical Architecture
|
|
@@ -212,19 +145,6 @@ The model was trained on a curated dataset of Portuguese administrative document
|
|
| 212 |
- **Threshold Sensitivity**: Performance depends on carefully tuned per-category thresholds
|
| 213 |
- **Class Imbalance**: Some categories may have lower precision due to limited training examples
|
| 214 |
|
| 215 |
-
## Citation
|
| 216 |
-
|
| 217 |
-
If you use this model in your research, please cite:
|
| 218 |
-
|
| 219 |
-
```bibtex
|
| 220 |
-
@article{intelligent_stacking_2024,
|
| 221 |
-
title={Intelligent Stacking for Multilabel Portuguese Administrative Document Classification},
|
| 222 |
-
author={[Your Name]},
|
| 223 |
-
journal={[Journal Name]},
|
| 224 |
-
year={2024},
|
| 225 |
-
note={Model available at https://huggingface.co/YOUR_USERNAME/intelligent-stacking}
|
| 226 |
-
}
|
| 227 |
-
```
|
| 228 |
|
| 229 |
## License
|
| 230 |
|
|
|
|
| 6 |
sdk: docker
|
| 7 |
app_port: 8501
|
| 8 |
tags:
|
|
|
|
| 9 |
- text-classification
|
| 10 |
- multilabel-classification
|
| 11 |
- portuguese
|
|
|
|
| 23 |
|
| 24 |
## Model Description
|
| 25 |
|
| 26 |
+
**Intelligent Stacking** is an advanced ensemble learning system specialized in multilabel classification of Portuguese administrative documents. The model combines 12 base models with intelligent meta-learning to achieve high performance on municipal and governmental document categorization tasks.
|
| 27 |
|
| 28 |
**Try out the model**: [Hugging Face Space Demo](https://huggingface.co/spaces/YOUR_USERNAME/intelligent-stacking-demo)
|
| 29 |
|
|
|
|
| 32 |
- 🧠 **Intelligent Meta-Learning**: Advanced ensemble combination using stacked generalization
|
| 33 |
- 📚 **12 Base Models**: 3 feature sets × 4 algorithms for robust predictions
|
| 34 |
- 🇵🇹 **Portuguese Optimized**: Fine-tuned for Portuguese administrative language
|
| 35 |
+
- ⚡ **High Performance**: F1-macro score of 0.5486
|
| 36 |
- 🏢 **22 Categories**: Comprehensive municipal administrative document classification
|
| 37 |
- 🎯 **Dynamic Thresholds**: Optimized per-category decision boundaries
|
| 38 |
|
|
|
|
| 64 |
|
| 65 |
4. **Dynamic Thresholds**: Per-category optimized decision boundaries for multilabel output
|
| 66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
## Categories
|
| 69 |
|
|
|
|
| 107 |
| **Hamming Loss** | **0.0426** | Label-wise error rate |
|
| 108 |
| **Average Precision (macro)** | **0.608** | Macro-averaged AP |
|
| 109 |
| **Average Precision (micro)** | **0.785** | Micro-averaged AP |
|
|
|
|
| 110 |
|
| 111 |
|
| 112 |
## Technical Architecture
|
|
|
|
| 145 |
- **Threshold Sensitivity**: Performance depends on carefully tuned per-category thresholds
|
| 146 |
- **Class Imbalance**: Some categories may have lower precision due to limited training examples
|
| 147 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 148 |
|
| 149 |
## License
|
| 150 |
|