Spaces:

AnonymousECCV15285
/

MMIB_Counterfactual_image_generation_tool

Sleeping

App Files Files Community

MMIB_Counterfactual_image_generation_tool / README.md

AnonymousECCV15285

Update README.md

9b5311e verified 3 days ago

preview code

raw

history blame contribute delete

8.15 kB

metadata

title: MMIB Counterfactual Image Tool
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
license: mit
arxiv: 2401.xxxxx

A Python-based pipeline for generating CLEVR-style scenes with counterfactual variants and Blender rendering. Includes configurable object counts, diverse counterfactual types, and automated question-answer dataset generation.

Key Features

18 Counterfactual Types: Includes 10 Semantic (Image) CFs that change VQA answers and 8 Negative CFs (perceptual stress tests) that should not change answers.
Automated Blender Rendering: Integrated pipeline using Blender's Cycles engine with optional GPU support.
QA Dataset Generation: Automatically generates questions and answers tailored to the specific counterfactual changes applied.
Robust Workflow: Includes resume support for interrupted jobs and a Streamlit web interface for interactive use.

Hugging Face Spaces Deployment

This app is configured to run on Hugging Face Spaces using Docker. The included Dockerfile handles the Blender installation and all necessary dependencies for scene generation and rendering.

Requirements

Python Dependencies

The pipeline requires several Python libraries (primarily pandas, pp-vqa, streamlit, and visual geometry libraries). Install them using:

pip install -r requirements.txt
BlenderBlender must be installed separately as it is not available via pip. The core scripts are designed to run inside Blender's bundled Python environment. The pipeline will attempt to auto-detect Blender on your system PATH, or you can specify the executable path manually via arguments.InstallationClone this repository:Bashgit clone [https://huggingface.co/spaces/scholo/MMB_Counterfactual_Image_Tool](https://huggingface.co/spaces/scholo/MMB_Counterfactual_Image_Tool) MMB_Tool
cd MMB_Tool
Install Python dependencies:Bashpip install -r requirements.txt
UsageQuick StartGenerate 5 scene sets (original + variants) with 5 objects each using default settings:Bashpython pipeline.py --num_scenes 5 --num_objects 5 --run_name my_first_run
1. The Main Pipeline (pipeline.py)This is the primary entry point. By default, it generates scene JSON files and then renders them to PNG images.Common CommandsBash# Generate 10 scene sets, skip rendering (JSON only)
python pipeline.py --num_scenes 10 --num_objects 5 --run_name exp_json_only --skip_render

# Render previously generated scenes
python pipeline.py --render_only --run_name exp_json_only --use_gpu 1

# Generate and validate questions immediately after rendering
python pipeline.py --num_scenes 10 --num_objects 5 --generate_questions
Detailed ArgumentsArgumentDefaultDescription--num_scenes5Number of scene sets to generate.--num_objectsNoneFixed number of objects per scene (overrides min/max).--min_objects3Minimum object count if --num_objects is unset.--max_objects7Maximum object count if --num_objects is unset.--num_counterfactuals2Number of counterfactual variants per scene set.--blender_pathautoPath to Blender executable.--output_diroutputBase directory for runs.--run_nametimestampCustom name for this run folder.--use_gpu01 = enable GPU rendering, 0 = CPU.--samples512Cycles rendering samples (higher = better quality, slower).--width / --height320 / 240Image resolution.--skip_renderFalseGenerate JSON scene files only.--render_onlyFalseRender existing JSON files in a run directory.--resumeFalseResume interrupted rendering in an existing run.--generate_questionsFalseGenerate QA dataset immediately after rendering.--cf_typesNoneSpace-separated list of specific CF types to use.--list_cf_typesFalsePrint available counterfactual types and exit.2. Counterfactual TypesThe pipeline supports 18 different counterfactual types. If --cf_types is unset, the pipeline defaults to 1 Image CF + 1 Negative CF per scene.Image Counterfactuals (Semantic - should change VQA answers):change_color: Change the color of a random object.change_shape: Change shape (cube/sphere/cylinder).change_size: Change size (small $\leftrightarrow$ large).change_material: Change material (metal $\leftrightarrow$ rubber).change_position: Move object (includes collision detection).add_object: Add a new random object.remove_object: Remove a random object.replace_object: Replace object (keeping position).swap_attribute: Swap an attribute (e.g. color) between two objects.relational_flip: Move object from left of X to right of X.Negative Counterfactuals (Perceptual stress tests - should NOT change VQA answers):change_background: Change the ground color.change_lighting: Change lighting (bright/dim/warm/cool, etc.).add_noise: Add image noise/grain.occlusion_change: Move object to partially hide another (visual only).apply_fisheye: Apply fisheye lens distortion.apply_blur: Apply Gaussian blur.apply_vignette: Apply vignette effect.apply_chromatic_aberration: Apply chromatic aberration.3. Generating Question MappingsIf not run as part of the main pipeline, use scripts/generate_questions_mapping.py to create a CSV with counterfactual questions and answers.ExamplesBash# Generate validated questions for a specific completed run
python scripts/generate_questions_mapping.py --output_dir output/experiment1 --generate_questions

# Generate wide and long format QA datasets
python scripts/generate_questions_mapping.py --output_dir output/my_run --generate_questions --long_format
Ensuring Valid Counterfactual QAThe pipeline ensures that for semantic CFs, the counterfactual image’s answer to its counterfactual question differs from the original image’s answer to the original question.Targeted Templates: Questions are chosen from templates targeting the changed attribute or object count.Automatic Retries: The script retries up to 50 times per scene (configurable via MAX_CF_ANSWER_RETRIES inside the script) to find a question pair that ensures an answer flip.4. Advanced: Generating 720p Semantic DatasetsTo generate a 1280$\times$720 set consisting only of semantic (image) counterfactuals with unique types per variant and answer validation filtering:Bashpython pipeline.py --num_scenes 1000 --run_name dataset_720p --width 1280 --height 720 --num_counterfactuals 2 --cf_types change_color change_shape change_size change_material change_position add_object remove_object replace_object swap_attribute relational_flip --generate_questions --filter_same_answer
5. Other ScriptsGenerate Examples: Create one example of every counterfactual type applied to a base scene.Bashpython scripts/generate_examples.py --render
6. Web Interface (Streamlit)For an interactive experience with parameter configuration and visual previews:Bashstreamlit run app.py
Project StructurePlaintext.
├── pipeline.py                     # Main pipeline entry point
├── app.py                          # Streamlit web interface
├── requirements.txt                # Python dependencies
├── Dockerfile                      # Docker configuration for HF Spaces
├── data/                           # Scene assets (base blend, properties, definitions)
├── scripts/                        # Internal scripts (render, generate_scenes, QA)
└── output/                         # Output directory for runs
Note: The pipeline dynamically generates render_images_patched.py and temp_output/ directories during execution. These are automatically cleaned up and are ignored by git.Output StructureEach run creates a structured directory:Plaintextoutput/<run_name>/
├── run_metadata.json               # Configuration used for this run
├── scenes/                         # Scene JSON files
├── images/                         # Rendered PNG images
├── image_mapping_with_questions.csv # Wide-format QA dataset
└── qa_dataset.csv                   # Long-format QA dataset (if requested)
ContributingContributions are welcome! Please ensure that code follows the existing style (clean formatting, single blank lines between functions) and that docstrings are concise.AcknowledgmentsThis project is inspired by the CLEVR dataset and uses Blender for 3D scene rendering.LicenseMIT License