Spaces:

orderlymirror
/

aaa

Sleeping

File size: 1,706 Bytes

8ece6f3

# Model Card: Difficulty Model

## Model Details

- **Model Name:** difficulty_model
- **Model Version:** difficulty_model_v2_baseline_001
- **Algorithm:** RandomForestRegressor
- **Framework:** scikit-learn
- **Trained At:** 2026-05-21T05:59:09.943332+00:00
- **Seed:** 42

## Intended Use

Estimate question difficulty as a continuous score in [0, 1] based on
question features (bloom_score, grade, subject, question_type). Used in
the difficulty estimation endpoint to predict how hard a question is for
a given grade level.

## Training Data

- **Source:** training_lo_tagging.csv + questions.csv (for question_type)
- **Split Counts:** train=3912, validation=1033, test=875
- **Features:** bloom_score (numeric), grade (numeric), subject (OrdinalEncoded), question_type (OrdinalEncoded)
- **Target:** difficulty_score (continuous [0, 1])

## Metrics

### Validation Set
- MAE: 0.3475
- R-squared: 0.5003
- Per-bucket MAE: {'easy': 0.3058, 'medium': 0.2934, 'hard': 0.6563}

### Test Set
- MAE: 0.3519
- R-squared: 0.4685
- Per-bucket MAE: {'easy': 0.325, 'medium': 0.2885, 'hard': 0.6797}

## Known Limitations

- Trained on synthetic data only — performance on real questions is unknown.
- difficulty_score distribution may not reflect real-world difficulty.
- OrdinalEncoder assumes an ordering that may not be meaningful for subject/question_type.
- Per-bucket MAE depends on the quality of the difficulty string labels.
- Limited feature set (4 features); text-based features could improve performance.

## Fallback Behavior

When the model is not loaded or confidence is below threshold, the system
falls back to a rule-based difficulty estimation using bloom_score and
grade-level heuristics.