EduScale / docs /DATASET_CARD.md
jimzzzz's picture
Add EduScale models, docs, and benchmarks
1aaff2e

EduScale AI Benchmark Dataset Card

Dataset Description

This repository includes benchmark result files for evaluating EduScale AI on educational slide images. The benchmark files contain per-image image quality, OCR, and runtime metrics for the 2x and 3x TensorFlow Lite models.

The underlying image files are not included in this repository. The CSV files are included as benchmark evidence for model comparison.

Data Sources

The project data was gathered from two source categories:

Source Description Redistribution Status
Dataset for ppt by Manisha717 on Kaggle Public collection of PPT files for testing, covering varied presentation topics Not redistributed here because the Kaggle page lists the license as unknown
Creator-generated samples Additional educational slide/image samples created by the project author Used for project evaluation and development

The benchmark CSVs are included to document model behavior without republishing the original PPT files or source images.

Files

File Description
benchmarks/benchmark_x2.csv Per-image benchmark results for the 2x model
benchmarks/benchmark_x3.csv Per-image benchmark results for the 3x model
benchmarks/benchmark_summary.csv Averaged model comparison summary

CSV Schema

The detailed benchmark files use the following columns:

Column Description
file Evaluation image filename
psnr Peak signal-to-noise ratio
ssim Structural similarity index
runtime_ms Inference runtime in milliseconds
lr_conf OCR confidence on low-resolution input
sr_conf OCR confidence after super-resolution
hr_conf OCR confidence on high-resolution reference
baseline_cer Character error rate for the baseline input
sr_cer Character error rate after super-resolution

The summary file uses:

Column Description
model Model identifier
scale Upscaling factor
psnr Average PSNR
ssim Average SSIM
ocr_confidence Average sr_conf
cer Average sr_cer
runtime_ms Average runtime in milliseconds
device Device used for benchmark measurement

Evaluation Device

Benchmarks in benchmark_summary.csv are labeled as measured on:

Realme Note 50

Known Limitations

  • The CSV files provide benchmark metrics, not the original image dataset.
  • Results are specific to the evaluation pipeline and device used.
  • The benchmark domain is educational slides and may not generalize to all scanned documents, handwriting, photos, or non-educational imagery.