Spaces:

ayushm98
/

cascade

Sleeping

App Files Files Community

cascade / ml /README.md

ayushm98

docs: add README for ML training pipeline

b7cedbd 11 months ago

preview code

raw

history blame contribute delete

1.23 kB

ML Pipeline

This directory contains the machine learning pipeline for training the complexity classifier.

Structure

ml/
├── data/              # Dataset loading and preprocessing
│   └── load_dataset.py
├── training/          # Model training and evaluation
│   ├── train.py       # DistilBERT fine-tuning
│   └── evaluate.py    # Model evaluation
├── export/            # Model export
│   └── convert_to_onnx.py
└── artifacts/         # Saved models and metrics
    ├── model.onnx
    └── metrics.json

Training

# Train the complexity classifier
python -m ml.training.train --dataset arc --epochs 5

# Evaluate the model
python -m ml.training.evaluate --model-dir ml/artifacts/complexity-classifier

# Export to ONNX
python -m ml.export.convert_to_onnx --model-dir ml/artifacts/complexity-classifier

Dataset

The classifier is trained on the ARC dataset (AI2 Reasoning Challenge) which provides:

Easy examples: Simple questions that can be handled by smaller models
Challenge examples: Complex questions requiring more capable models

Alternatively, Easy2Hard-Bench can be used for continuous difficulty scores.