| # EduScale AI Benchmark Dataset Card |
|
|
| ## Dataset Description |
|
|
| This repository includes benchmark result files for evaluating EduScale AI on educational slide images. The benchmark files contain per-image image quality, OCR, and runtime metrics for the 2x and 3x TensorFlow Lite models. |
|
|
| The underlying image files are not included in this repository. The CSV files are included as benchmark evidence for model comparison. |
|
|
| ## Data Sources |
|
|
| The project data was gathered from two source categories: |
|
|
| | Source | Description | Redistribution Status | |
| |---|---|---| |
| | [Dataset for ppt](https://www.kaggle.com/datasets/manisha717/dataset-for-ppt) by Manisha717 on Kaggle | Public collection of PPT files for testing, covering varied presentation topics | Not redistributed here because the Kaggle page lists the license as unknown | |
| | Creator-generated samples | Additional educational slide/image samples created by the project author | Used for project evaluation and development | |
|
|
| The benchmark CSVs are included to document model behavior without republishing the original PPT files or source images. |
|
|
| ## Files |
|
|
| | File | Description | |
| |---|---| |
| | `benchmarks/benchmark_x2.csv` | Per-image benchmark results for the 2x model | |
| | `benchmarks/benchmark_x3.csv` | Per-image benchmark results for the 3x model | |
| | `benchmarks/benchmark_summary.csv` | Averaged model comparison summary | |
|
|
| ## CSV Schema |
|
|
| The detailed benchmark files use the following columns: |
|
|
| | Column | Description | |
| |---|---| |
| | `file` | Evaluation image filename | |
| | `psnr` | Peak signal-to-noise ratio | |
| | `ssim` | Structural similarity index | |
| | `runtime_ms` | Inference runtime in milliseconds | |
| | `lr_conf` | OCR confidence on low-resolution input | |
| | `sr_conf` | OCR confidence after super-resolution | |
| | `hr_conf` | OCR confidence on high-resolution reference | |
| | `baseline_cer` | Character error rate for the baseline input | |
| | `sr_cer` | Character error rate after super-resolution | |
|
|
| The summary file uses: |
|
|
| | Column | Description | |
| |---|---| |
| | `model` | Model identifier | |
| | `scale` | Upscaling factor | |
| | `psnr` | Average PSNR | |
| | `ssim` | Average SSIM | |
| | `ocr_confidence` | Average `sr_conf` | |
| | `cer` | Average `sr_cer` | |
| | `runtime_ms` | Average runtime in milliseconds | |
| | `device` | Device used for benchmark measurement | |
|
|
| ## Evaluation Device |
|
|
| Benchmarks in `benchmark_summary.csv` are labeled as measured on: |
|
|
| `Realme Note 50` |
|
|
| ## Known Limitations |
|
|
| - The CSV files provide benchmark metrics, not the original image dataset. |
| - Results are specific to the evaluation pipeline and device used. |
| - The benchmark domain is educational slides and may not generalize to all scanned documents, handwriting, photos, or non-educational imagery. |
|
|