EduScale AI Benchmark Dataset Card

Dataset Description

This repository includes benchmark result files for evaluating EduScale AI on educational slide images. The benchmark files contain per-image image quality, OCR, and runtime metrics for the 2x and 3x TensorFlow Lite models.

The underlying image files are not included in this repository. The CSV files are included as benchmark evidence for model comparison.

Data Sources

The project data was gathered from two source categories:

Source	Description	Redistribution Status
Dataset for ppt by Manisha717 on Kaggle	Public collection of PPT files for testing, covering varied presentation topics	Not redistributed here because the Kaggle page lists the license as unknown
Creator-generated samples	Additional educational slide/image samples created by the project author	Used for project evaluation and development

The benchmark CSVs are included to document model behavior without republishing the original PPT files or source images.

Files

File	Description
`benchmarks/benchmark_x2.csv`	Per-image benchmark results for the 2x model
`benchmarks/benchmark_x3.csv`	Per-image benchmark results for the 3x model
`benchmarks/benchmark_summary.csv`	Averaged model comparison summary

CSV Schema

The detailed benchmark files use the following columns:

Column	Description
`file`	Evaluation image filename
`psnr`	Peak signal-to-noise ratio
`ssim`	Structural similarity index
`runtime_ms`	Inference runtime in milliseconds
`lr_conf`	OCR confidence on low-resolution input
`sr_conf`	OCR confidence after super-resolution
`hr_conf`	OCR confidence on high-resolution reference
`baseline_cer`	Character error rate for the baseline input
`sr_cer`	Character error rate after super-resolution

The summary file uses:

Column	Description
`model`	Model identifier
`scale`	Upscaling factor
`psnr`	Average PSNR
`ssim`	Average SSIM
`ocr_confidence`	Average `sr_conf`
`cer`	Average `sr_cer`
`runtime_ms`	Average runtime in milliseconds
`device`	Device used for benchmark measurement

Evaluation Device

Benchmarks in benchmark_summary.csv are labeled as measured on:

Realme Note 50

Known Limitations

The CSV files provide benchmark metrics, not the original image dataset.
Results are specific to the evaluation pipeline and device used.
The benchmark domain is educational slides and may not generalize to all scanned documents, handwriting, photos, or non-educational imagery.