File size: 1,706 Bytes
8ece6f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# Model Card: Difficulty Model

## Model Details

- **Model Name:** difficulty_model
- **Model Version:** difficulty_model_v2_baseline_001
- **Algorithm:** RandomForestRegressor
- **Framework:** scikit-learn
- **Trained At:** 2026-05-21T05:59:09.943332+00:00
- **Seed:** 42

## Intended Use

Estimate question difficulty as a continuous score in [0, 1] based on
question features (bloom_score, grade, subject, question_type). Used in
the difficulty estimation endpoint to predict how hard a question is for
a given grade level.

## Training Data

- **Source:** training_lo_tagging.csv + questions.csv (for question_type)
- **Split Counts:** train=3912, validation=1033, test=875
- **Features:** bloom_score (numeric), grade (numeric), subject (OrdinalEncoded), question_type (OrdinalEncoded)
- **Target:** difficulty_score (continuous [0, 1])

## Metrics

### Validation Set
- MAE: 0.3475
- R-squared: 0.5003
- Per-bucket MAE: {'easy': 0.3058, 'medium': 0.2934, 'hard': 0.6563}

### Test Set
- MAE: 0.3519
- R-squared: 0.4685
- Per-bucket MAE: {'easy': 0.325, 'medium': 0.2885, 'hard': 0.6797}

## Known Limitations

- Trained on synthetic data only — performance on real questions is unknown.
- difficulty_score distribution may not reflect real-world difficulty.
- OrdinalEncoder assumes an ordering that may not be meaningful for subject/question_type.
- Per-bucket MAE depends on the quality of the difficulty string labels.
- Limited feature set (4 features); text-based features could improve performance.

## Fallback Behavior

When the model is not loaded or confidence is below threshold, the system
falls back to a rule-based difficulty estimation using bloom_score and
grade-level heuristics.