Spaces:
Sleeping
Sleeping
File size: 1,706 Bytes
8ece6f3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | # Model Card: Difficulty Model
## Model Details
- **Model Name:** difficulty_model
- **Model Version:** difficulty_model_v2_baseline_001
- **Algorithm:** RandomForestRegressor
- **Framework:** scikit-learn
- **Trained At:** 2026-05-21T05:59:09.943332+00:00
- **Seed:** 42
## Intended Use
Estimate question difficulty as a continuous score in [0, 1] based on
question features (bloom_score, grade, subject, question_type). Used in
the difficulty estimation endpoint to predict how hard a question is for
a given grade level.
## Training Data
- **Source:** training_lo_tagging.csv + questions.csv (for question_type)
- **Split Counts:** train=3912, validation=1033, test=875
- **Features:** bloom_score (numeric), grade (numeric), subject (OrdinalEncoded), question_type (OrdinalEncoded)
- **Target:** difficulty_score (continuous [0, 1])
## Metrics
### Validation Set
- MAE: 0.3475
- R-squared: 0.5003
- Per-bucket MAE: {'easy': 0.3058, 'medium': 0.2934, 'hard': 0.6563}
### Test Set
- MAE: 0.3519
- R-squared: 0.4685
- Per-bucket MAE: {'easy': 0.325, 'medium': 0.2885, 'hard': 0.6797}
## Known Limitations
- Trained on synthetic data only — performance on real questions is unknown.
- difficulty_score distribution may not reflect real-world difficulty.
- OrdinalEncoder assumes an ordering that may not be meaningful for subject/question_type.
- Per-bucket MAE depends on the quality of the difficulty string labels.
- Limited feature set (4 features); text-based features could improve performance.
## Fallback Behavior
When the model is not loaded or confidence is below threshold, the system
falls back to a rule-based difficulty estimation using bloom_score and
grade-level heuristics.
|