Spaces:

scholo
/

MMIB-Dataset-analysis-tool

Running

App Files Files Community

MMIB-Dataset-analysis-tool / README.md

Scholarus

Space: app only, no binary data (push-friendly)

6371d28 about 1 month ago

preview code

raw

history blame contribute delete

3.78 kB

	---
	title: MMB Dataset Visualizer
	sdk: docker
	app_port: 7860
	---

	# MMB Dataset Visualizer

	Visualizes MMB-style CSVs from upload or `output/` / `data/`. Dataset: [scholo/MMB_dataset](https://huggingface.co/datasets/scholo/MMB_dataset)

	## Data source

	Sidebar: choose Upload dataset or CSV files.

	### Upload dataset

	Upload your own dataset as a ZIP or CSV:

	1. Select Upload dataset in the sidebar.
	2. Upload a ZIP (recommended) or CSV file.
	3. Click Use this file. You can Clear uploaded dataset to remove it and upload another.

	ZIP structure (same layout as on disk): include your CSV at the root (or in a subfolder), an `images/` folder with the files referenced by the CSV, and optionally `scenes/` for counterfactual types. Example:

	```
	mydata.zip
	image_mapping_with_questions.csv
	images/
	scene_0001_original.png
	scene_0001_cf1.png
	...
	scenes/ (optional)
	scene_0001_cf1.json
	...
	```

	CSV-only upload: you can upload just a CSV. Image columns will show filenames only (no thumbnails) unless you use a ZIP with `images/`.

	### CSV files

	The app discovers CSVs under `output/`, `data/`, and `hf_dataset/` (recursive).
	On the Hugging Face Space, `data/` and `hf_dataset/` are not in the repo (binary images excluded); use Upload dataset to visualize. Use the sidebar CSV file dropdown to pick one.

	- `image_mapping_with_questions.csv` — Original + counterfactual images, questions, difficulties, answer matrix (e.g. from the MMB Counterfactual Image Generation Tool).
	- `image_mapping.csv` — Images only (`original_image`, `counterfactual1_image`, `counterfactual2_image`).

	Put your CSV next to `images/` and optionally `scenes/` (e.g. `data/example/image_mapping_with_questions.csv`, `data/example/images/`, `data/example/scenes/`). You get:

	- Image sets — Each scene set as a row with thumbnails (Original, CF1, CF2), counterfactual types (e.g. CF1 `change_color`, CF2 `change_lighting`), scene ID, questions & difficulties. Optional “Include answer matrix in each row.”
	- Overview — Scene-set count, column summary.
	- Difficulty & questions — Bar charts of difficulty (easy/medium/hard) by question type (Original, CF1, CF2).
	- Counterfactual types — Table of scene → CF1 type, CF2 type; bar chart of type counts by slot (from `scenes/_cf1.json`, `_cf2.json`).
	- Answer matrix — 3×3 grid per scene (image × question).

	Thumbnails use `<csv_directory>/images/`. If missing, filenames are shown. Counterfactual types come from `<csv_directory>/scenes/` (`cf_metadata.cf_type` in `_cf1.json`, `_cf2.json`). If `scenes/` is missing, types are omitted.

	## Deploy to Hugging Face (dataset + Space)

	Push to both the dataset and the Streamlit Space:

	```bash
	pip install -r requirements-upload.txt huggingface_hub
	hf auth login # if not in PATH: python -m huggingface_hub.cli.hf auth login

	python scripts/deploy_both.py
	```

	This uploads:
	- Dataset → [scholo/MMB_dataset](https://huggingface.co/datasets/scholo/MMB_dataset)
	- Space → [scholo/Datasetviewer](https://huggingface.co/spaces/scholo/Datasetviewer)

	Options:
	- `--dataset-only` — Only push the dataset
	- `--space-only` — Only push the Space

	Dataset source: The dataset is read from the `hf_dataset/` folder. Put your CSV, `images/`, and `scenes/` there. See `hf_dataset/README.md`.

	To push only the dataset manually:

	```bash
	python scripts/upload_to_huggingface.py hf_dataset/image_mapping_with_questions.csv --repo-id scholo/MMB_dataset
	```

	## Setup and run

	```bash
	pip install -r requirements.txt
	streamlit run app.py
	```

	Open the URL shown (e.g. `http://localhost:8501`).