File size: 8,665 Bytes
225af6a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
---
language: en
license: mit
tags:
- multi-label-classification
- tfidf
- embeddings
- random-forest
- oversampling
- mlsmote
- software-engineering
datasets:
- NLBSE/SkillCompetition
model-index:
- name: random_forest_tfidf_gridsearch
results:
- status: success
metrics:
cv_best_f1_micro: 0.595038375202279
test_precision_micro: 0.690371373744215
test_recall_micro: 0.5287455692919513
test_f1_micro: 0.5988446098110252
params:
estimator__max_depth: '10'
estimator__min_samples_split: '2'
estimator__n_estimators: '200'
feature_type: embedding
model_type: RandomForest + MultiOutput
use_cleaned: 'True'
oversampling: 'False'
dvc:
path: random_forest_tfidf_gridsearch.pkl
- name: random_forest_tfidf_gridsearch_smote
results:
- status: success
metrics:
cv_best_f1_micro: 0.59092598557871
test_precision_micro: 0.6923300238053766
test_recall_micro: 0.5154318319356791
test_f1_micro: 0.59092598557871
params:
feature_type: tfidf
oversampling: 'MLSMOTE (RandomOverSampler fallback)'
dvc:
path: random_forest_tfidf_gridsearch_smote.pkl
- name: random_forest_embedding_gridsearch
results:
- status: success
metrics:
cv_best_f1_micro: 0.6012826418169578
test_precision_micro: 0.703060266254212
test_recall_micro: 0.5252460640075934
test_f1_micro: 0.6012826418169578
params:
feature_type: embedding
oversampling: 'False'
dvc:
path: random_forest_embedding_gridsearch.pkl
- name: random_forest_embedding_gridsearch_smote
results:
- status: success
metrics:
cv_best_f1_micro: 0.5962084744755453
test_precision_micro: 0.7031004709576139
test_recall_micro: 0.5175288364319172
test_f1_micro: 0.5962084744755453
params:
feature_type: embedding
oversampling: 'MLSMOTE (RandomOverSampler fallback)'
dvc:
path: random_forest_embedding_gridsearch_smote.pkl
---
Model cards for committed models
Overview
- This file documents four trained model artifacts available in the repository: two TF‑IDF based Random Forest models (baseline and with oversampling) and two embedding‑based Random Forest models (baseline and with oversampling).
- For dataset provenance and preprocessing details see `data/README.md`.
1) random_forest_tfidf_gridsearch
Model details
- Name: `random_forest_tfidf_gridsearch`
- Organization: Hopcroft (se4ai2526-uniba)
- Model type: `RandomForestClassifier` wrapped in `MultiOutputClassifier` for multi-label outputs
- Branch: `Milestone-4`
Intended use
- Suitable for research and benchmarking on multi-label skill prediction for GitHub PRs/issues. Not intended for automated high‑stakes decisions or profiling individuals without further validation.
Training data and preprocessing
- Dataset: Processed SkillScope Dataset (NLBSE/SkillCompetition) as prepared for this project.
- Features: TF‑IDF (unigrams and bigrams), up to `MAX_TFIDF_FEATURES=5000`.
- Feature and label files are referenced via `get_feature_paths(feature_type='tfidf', use_cleaned=True)` in `config.py`.
Evaluation
- Reported metrics include micro‑precision, micro‑recall and micro‑F1 on a held‑out test split.
- Protocol: 80/20 multilabel‑stratified split; hyperparameters selected via 5‑fold cross‑validation optimizing `f1_micro`.
- MLflow run: `random_forest_tfidf_gridsearch` (see `hopcroft_skill_classification_tool_competition/config.py`).
Limitations and recommendations
- Trained on Java repositories; generalization to other languages is not ensured.
- Label imbalance affects rare labels; apply per‑label thresholds or further sampling strategies if required.
Usage
- Artifact path: `models/random_forest_tfidf_gridsearch.pkl`.
- Example:
```python
import joblib
model = joblib.load('models/random_forest_tfidf_gridsearch.pkl')
y = model.predict(X_tfidf)
```
2) random_forest_tfidf_gridsearch_smote
Model details
- Name: `random_forest_tfidf_gridsearch_smote`
- Model type: `RandomForestClassifier` inside `MultiOutputClassifier` trained with multi‑label oversampling
Intended use
- Intended to improve recall for under‑represented labels by applying MLSMOTE (or RandomOverSampler fallback) during training.
Training and preprocessing
- Features: TF‑IDF (same configuration as the baseline).
- Oversampling: local MLSMOTE implementation when available; otherwise `RandomOverSampler`. Oversampling metadata (method and synthetic sample counts) are logged to MLflow.
- Training script: `hopcroft_skill_classification_tool_competition/modeling/train.py` (action `smote`).
Evaluation
- MLflow run: `random_forest_tfidf_gridsearch_smote`.
Limitations and recommendations
- Synthetic samples may introduce distributional artifacts; validate synthetic examples and per‑label metrics before deployment.
Usage
- Artifact path: `models/random_forest_tfidf_gridsearch_smote.pkl`.
3) random_forest_embedding_gridsearch
Model details
- Name: `random_forest_embedding_gridsearch`
- Features: sentence embeddings produced by `all-MiniLM-L6-v2` (see `config.EMBEDDING_MODEL_NAME`).
Intended use
- Uses semantic embeddings to capture contextual information from PR text; suitable for research and prototyping.
Training and preprocessing
- Embeddings generated and stored via `get_feature_paths(feature_type='embedding', use_cleaned=True)`.
- Training script: see `hopcroft_skill_classification_tool_competition/modeling/train.py`.
Evaluation
- MLflow run: `random_forest_embedding_gridsearch`.
Limitations and recommendations
- Embeddings encode dataset biases; verify performance when transferring to other repositories or languages.
Usage
- Artifact path: `models/random_forest_embedding_gridsearch.pkl`.
- Example:
```python
model.predict(X_embeddings)
```
4) random_forest_embedding_gridsearch_smote
Model details
- Name: `random_forest_embedding_gridsearch_smote`
- Combines embedding features with multi‑label oversampling to address rare labels.
Training and evaluation
- Oversampling: MLSMOTE preferred; `RandomOverSampler` fallback if MLSMOTE is unavailable.
- MLflow run: `random_forest_embedding_gridsearch_smote`.
Limitations and recommendations
- Review synthetic examples and re‑evaluate on target data prior to deployment.
Usage
- Artifact path: `models/random_forest_embedding_gridsearch_smote.pkl`.
Publishing guidance for Hugging Face Hub
- The YAML front‑matter enables rendering on the Hugging Face Hub. Recommended repository contents for publishing:
- `README.md` (this file)
- model artifact(s) (`*.pkl`)
- vectorizer(s) and label map (e.g. `tfidf_vectorizer.pkl`, `label_names.pkl`)
- a minimal inference example or notebook
Evaluation Data and Protocol
- Evaluation split: an 80/20 multilabel‑stratified train/test split was used for final evaluation.
- Cross-validation: hyperparameters were selected via 5‑fold cross‑validation optimizing `f1_micro`.
- Test metrics reported: micro precision, micro recall, micro F1 (reported in the YAML `model-index` for each model).
Quantitative Analyses
- Reported unitary results: micro‑precision, micro‑recall and micro‑F1 on the held‑out test split for each model.
- Where available, `cv_best_f1_micro` is the best cross‑validation f1_micro recorded during training; when a CV value was not present in tracking, the test F1 is used as a proxy and noted in the README.
- Notes on comparability: TF‑IDF and embedding models are evaluated on the same held‑out splits (features differ); reported metrics are comparable for broad benchmarking but not for per‑label fairness analyses.
How Metrics Were Computed
- Metrics were computed using scikit‑learn's `precision_score`, `recall_score`, and `f1_score` with `average='micro'` and `zero_division=0` on the held‑out test labels and model predictions.
- Test feature and label files used are available under `data/processed/tfidf/` and `data/processed/embedding/` (paths referenced from `hopcroft_skill_classification_tool_competition.config.get_feature_paths`).
Ethical Considerations and Caveats
- The dataset contains examples from Java repositories; model generalization to other languages or domains is not guaranteed.
- Label imbalance is present; oversampling (MLSMOTE or RandomOverSampler fallback) was used in two variants to improve recall for rare labels — inspect per‑label metrics before deploying.
- The models and README are intended for research and benchmarking. They are not validated for safety‑critical or high‑stakes automated decisioning.
|