title: MMIB Dataset Visualizer
sdk: docker
app_port: 7860
MMB Dataset Visualizer
Visualizes MMIB-style CSVs from upload or output/ / data/. Dataset: scholo/MMB_dataset
Data source
Sidebar: choose Upload dataset or CSV files.
Upload dataset
Upload your own dataset as a ZIP or CSV:
- Select Upload dataset in the sidebar.
- Upload a ZIP (recommended) or CSV file.
- Click Use this file. You can Clear uploaded dataset to remove it and upload another.
ZIP structure (same layout as on disk): include your CSV at the root (or in a subfolder), an images/ folder with the files referenced by the CSV, and optionally scenes/ for counterfactual types. Example:
mydata.zip
image_mapping_with_questions.csv
images/
scene_0001_original.png
scene_0001_cf1.png
...
scenes/ (optional)
scene_0001_cf1.json
...
CSV-only upload: you can upload just a CSV. Image columns will show filenames only (no thumbnails) unless you use a ZIP with images/.
CSV files
The app discovers CSVs under output/, data/, and hf_dataset/ (recursive).
On the Hugging Face Space, data/ and hf_dataset/ are not in the repo (binary images excluded); use Upload dataset to visualize. Use the sidebar CSV file dropdown to pick one.
image_mapping_with_questions.csvβ Original + counterfactual images, questions, difficulties, answer matrix (e.g. from the MMB Counterfactual Image Generation Tool).image_mapping.csvβ Images only (original_image,counterfactual1_image,counterfactual2_image).
Put your CSV next to images/ and optionally scenes/ (e.g. data/example/image_mapping_with_questions.csv, data/example/images/, data/example/scenes/). You get:
- Image sets β Each scene set as a row with thumbnails (Original, CF1, CF2), counterfactual types (e.g. CF1
change_color, CF2change_lighting), scene ID, questions & difficulties. Optional βInclude answer matrix in each row.β - Overview β Scene-set count, column summary.
- Difficulty & questions β Bar charts of difficulty (easy/medium/hard) by question type (Original, CF1, CF2).
- Counterfactual types β Table of scene β CF1 type, CF2 type; bar chart of type counts by slot (from
scenes/*_cf1.json,*_cf2.json). - Answer matrix β 3Γ3 grid per scene (image Γ question).
Thumbnails use <csv_directory>/images/. If missing, filenames are shown. Counterfactual types come from <csv_directory>/scenes/ (cf_metadata.cf_type in *_cf1.json, *_cf2.json). If scenes/ is missing, types are omitted.
Deploy to Hugging Face (dataset + Space)
Push to both the dataset and the Streamlit Space:
pip install -r requirements-upload.txt huggingface_hub
hf auth login # if not in PATH: python -m huggingface_hub.cli.hf auth login
python scripts/deploy_both.py
This uploads:
- Dataset β scholo/MMB_dataset
- Space β scholo/Datasetviewer
Options:
--dataset-onlyβ Only push the dataset--space-onlyβ Only push the Space
Dataset source: The dataset is read from the hf_dataset/ folder. Put your CSV, images/, and scenes/ there. See hf_dataset/README.md.
To push only the dataset manually:
python scripts/upload_to_huggingface.py hf_dataset/image_mapping_with_questions.csv --repo-id scholo/MMB_dataset
Setup and run
pip install -r requirements.txt
streamlit run app.py
Open the URL shown (e.g. http://localhost:8501).