anonymous12321 commited on
Commit
6141800
·
verified ·
1 Parent(s): bf8931e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -82
README.md CHANGED
@@ -6,7 +6,6 @@ colorTo: blue
6
  sdk: docker
7
  app_port: 8501
8
  tags:
9
- - streamlit
10
  - text-classification
11
  - multilabel-classification
12
  - portuguese
@@ -24,7 +23,7 @@ base_model:
24
 
25
  ## Model Description
26
 
27
- **Intelligent Stacking** is an advanced ensemble learning system specialized in multilabel classification of Portuguese administrative documents. The model combines 12 base models with intelligent meta-learning to achieve state-of-the-art performance on municipal and governmental document categorization tasks.
28
 
29
  **Try out the model**: [Hugging Face Space Demo](https://huggingface.co/spaces/YOUR_USERNAME/intelligent-stacking-demo)
30
 
@@ -33,7 +32,7 @@ base_model:
33
  - 🧠 **Intelligent Meta-Learning**: Advanced ensemble combination using stacked generalization
34
  - 📚 **12 Base Models**: 3 feature sets × 4 algorithms for robust predictions
35
  - 🇵🇹 **Portuguese Optimized**: Fine-tuned for Portuguese administrative language
36
- - ⚡ **High Performance**: F1-macro score of 0.5486 with 54.7% improvement over baseline
37
  - 🏢 **22 Categories**: Comprehensive municipal administrative document classification
38
  - 🎯 **Dynamic Thresholds**: Optimized per-category decision boundaries
39
 
@@ -65,71 +64,6 @@ The Intelligent Stacking system operates in multiple stages:
65
 
66
  4. **Dynamic Thresholds**: Per-category optimized decision boundaries for multilabel output
67
 
68
- ## Usage
69
-
70
- ### Quick Start with Python
71
-
72
- ```python
73
- import joblib
74
- import numpy as np
75
- from sklearn.feature_extraction.text import TfidfVectorizer
76
- from scipy.sparse import hstack, csr_matrix
77
-
78
- # Load the model components
79
- tfidf_vectorizer = joblib.load("int_stacking_tfidf_vectorizer.joblib")
80
- meta_learner = joblib.load("int_stacking_meta_learner.joblib")
81
- mlb_encoder = joblib.load("int_stacking_mlb_encoder.joblib")
82
- base_models = joblib.load("int_stacking_base_models.joblib")
83
- optimal_thresholds = np.load("int_stacking_optimal_thresholds.npy")
84
-
85
- # Prepare text
86
- text = """CONTRATO DE PRESTAÇÃO DE SERVIÇOS
87
- Entre a Administração Pública Municipal e a empresa contratada,
88
- fica estabelecido o presente contrato para prestação de serviços
89
- de manutenção e conservação de vias públicas."""
90
-
91
- # Extract features
92
- tfidf_features = tfidf_vectorizer.transform([text])
93
-
94
- # Generate base model predictions
95
- base_predictions = np.zeros((1, len(mlb_encoder.classes_), 12))
96
- model_idx = 0
97
-
98
- for feat_name in ["TF-IDF", "BERT", "TF-IDF+BERT"]:
99
- for algo_name in ["LogReg_C1", "LogReg_C05", "GradBoost", "RandomForest"]:
100
- model_key = f"{feat_name}_{algo_name}"
101
- if model_key in base_models:
102
- model = base_models[model_key]
103
- pred = model.predict_proba(tfidf_features)
104
- base_predictions[0, :, model_idx] = pred[0]
105
- model_idx += 1
106
-
107
- # Meta-learner prediction
108
- meta_features = base_predictions.reshape(1, -1)
109
- meta_pred = meta_learner.predict_proba(meta_features)[0]
110
-
111
- # Apply dynamic thresholds
112
- predicted_labels = []
113
- for i, (prob, threshold) in enumerate(zip(meta_pred, optimal_thresholds)):
114
- if prob > threshold:
115
- predicted_labels.append({
116
- "label": mlb_encoder.classes_[i],
117
- "probability": float(prob),
118
- "confidence": "high" if prob > 0.7 else "medium" if prob > 0.4 else "low"
119
- })
120
-
121
- # Sort by probability
122
- predicted_labels.sort(key=lambda x: x["probability"], reverse=True)
123
- print("Predicted categories:", predicted_labels)
124
- ```
125
-
126
- ### Streamlit Demo
127
-
128
- The model includes a complete Streamlit web interface for easy testing:
129
-
130
- ```bash
131
- streamlit run app.py
132
- ```
133
 
134
  ## Categories
135
 
@@ -173,7 +107,6 @@ The model classifies documents into 22 Portuguese administrative categories:
173
  | **Hamming Loss** | **0.0426** | Label-wise error rate |
174
  | **Average Precision (macro)** | **0.608** | Macro-averaged AP |
175
  | **Average Precision (micro)** | **0.785** | Micro-averaged AP |
176
- | **Improvement** | **+54.7%** | Over Decision Tree baseline |
177
 
178
 
179
  ## Technical Architecture
@@ -212,19 +145,6 @@ The model was trained on a curated dataset of Portuguese administrative document
212
  - **Threshold Sensitivity**: Performance depends on carefully tuned per-category thresholds
213
  - **Class Imbalance**: Some categories may have lower precision due to limited training examples
214
 
215
- ## Citation
216
-
217
- If you use this model in your research, please cite:
218
-
219
- ```bibtex
220
- @article{intelligent_stacking_2024,
221
- title={Intelligent Stacking for Multilabel Portuguese Administrative Document Classification},
222
- author={[Your Name]},
223
- journal={[Journal Name]},
224
- year={2024},
225
- note={Model available at https://huggingface.co/YOUR_USERNAME/intelligent-stacking}
226
- }
227
- ```
228
 
229
  ## License
230
 
 
6
  sdk: docker
7
  app_port: 8501
8
  tags:
 
9
  - text-classification
10
  - multilabel-classification
11
  - portuguese
 
23
 
24
  ## Model Description
25
 
26
+ **Intelligent Stacking** is an advanced ensemble learning system specialized in multilabel classification of Portuguese administrative documents. The model combines 12 base models with intelligent meta-learning to achieve high performance on municipal and governmental document categorization tasks.
27
 
28
  **Try out the model**: [Hugging Face Space Demo](https://huggingface.co/spaces/YOUR_USERNAME/intelligent-stacking-demo)
29
 
 
32
  - 🧠 **Intelligent Meta-Learning**: Advanced ensemble combination using stacked generalization
33
  - 📚 **12 Base Models**: 3 feature sets × 4 algorithms for robust predictions
34
  - 🇵🇹 **Portuguese Optimized**: Fine-tuned for Portuguese administrative language
35
+ - ⚡ **High Performance**: F1-macro score of 0.5486
36
  - 🏢 **22 Categories**: Comprehensive municipal administrative document classification
37
  - 🎯 **Dynamic Thresholds**: Optimized per-category decision boundaries
38
 
 
64
 
65
  4. **Dynamic Thresholds**: Per-category optimized decision boundaries for multilabel output
66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
  ## Categories
69
 
 
107
  | **Hamming Loss** | **0.0426** | Label-wise error rate |
108
  | **Average Precision (macro)** | **0.608** | Macro-averaged AP |
109
  | **Average Precision (micro)** | **0.785** | Micro-averaged AP |
 
110
 
111
 
112
  ## Technical Architecture
 
145
  - **Threshold Sensitivity**: Performance depends on carefully tuned per-category thresholds
146
  - **Class Imbalance**: Some categories may have lower precision due to limited training examples
147
 
 
 
 
 
 
 
 
 
 
 
 
 
 
148
 
149
  ## License
150