Spaces:

ASU-IMPACT-Lab
/

EMMA-CVPR2026

Configuration error

App Files Files Community

EMMA-CVPR2026 / README.md

abanerj3

Upload folder using huggingface_hub

8004c1c verified 2 days ago

preview code

raw

history blame contribute delete

9.54 kB



	# EMMA: Extracting Multiple physical parameters from Multimodal Data

	CVPR 2026

	[Farhat Shaikh](https://scholar.google.com/citations?hl=en&user=mbAOSW0AAAAJ), [Ayan Banerjee](https://scholar.google.com/citations?user=UAlc7tEAAAAJ&hl=en), [Sandeep K. S. Gupta](https://scholar.google.com/citations?user=U9bcQkMAAAAJ&hl=en)

	IMPACT Lab, School of Computing & Augmented Intelligence (SCAI), Arizona State University

	[Project page](https://impactlabasu.github.io/EMMA-CVPR2026/) · [Demo video](https://youtu.be/Uo79pVlM6Rk)

	---

	## Overview

	EMMA is a physics-informed multimodal framework that recovers all identifiable dynamical parameters of a system directly from raw video, audio, and image-based time-series observations. Unlike prior video-only approaches that struggle with occluded states, hidden actuation inputs, and assumptions about known initial conditions, EMMA performs joint inference of explicit parameters, implicit dynamical components, and calibration invariants within a unified continuous-time model.

	The user supplies the parametric structure of the governing ODE; EMMA solves the inverse problem of recovering its parameters, along with any latent forcing and invariants, from multimodal observations.

	## Key contributions

	- Multi-modal dynamical parameter extraction from video, audio, and time-series reconstructed from visual charts.
	- Recovery under unobserved forcing inputs by inferring latent actuation (e.g. wheel speed) from audio.
	- Estimation of implicit dynamics associated with unmeasured physical effects (e.g. frictional drag).
	- Invariant calibration from raw video, eliminating assumptions about known initial conditions or coordinate frames.
	- Extensive validation on 100+ scenarios: Delfys benchmark (75 videos), real-world rover and quadrotor, and simulation charts.

	## Architecture

	<p align="center"><img src="docs/EMMA-arc.png" alt="EMMA architecture" width="780" /></p>

	EMMA follows a three-step pipeline: Sense · Learn · Verify.

	1. Sense. Video, audio, and chart images are converted into time-aligned signals through modality-specific pipelines.
	2. Learn. A Liquid Time-Constant (LTC) network models the system's latent dynamics in continuous time.
	3. Verify. A differentiable ODE solver simulates the recovered parameters and checks them against the observations under a physics-informed loss.

	## Results

	EMMA delivers accurate multi-parameter recovery across diverse physical systems. Full tables and ablations are in the [paper](docs/42612.pdf).

	\| System \| Parameters recovered \| EMMA error \| Best baseline \|
	\|--------\|----------------------\|------------\|---------------\|
	\| Pendulum (90 cm) \| Length L, damping τ \| L = 0.86 ± 0.07 m (GT 0.90) \| Delfys, PySINDy \|
	\| Torricelli (med.) \| Drainage k \| 0.0132 ± 0.0008 (GT 0.0128) \| matches Delfys \|
	\| Sliding block (med.) \| Angle α, friction μ \| α = 24.72°, μ = 0.205 (GT 25°, 0.20) \| Delfys, PySINDy \|
	\| LED decay (med.) \| γ \| 0.91 ± 0.0 (GT 0.92) \| matches Delfys \|
	\| Rover \| 9 params (5 with known ground truth) \| 8.8 % ± 1.7 % mean error \| first work under hidden forcing \|
	\| Quadrotor \| 12 params (7 with known ground truth) \| 15.9 % ± 7.4 % mean error \| first work under hidden forcing \|
	\| Simulation charts \| Lotka-Volterra, Lorenz, F8 Crusader, HIV, AID \| >10× lower error than PySINDy on implicit dynamics \| PySINDy \|

	Compared against PAIG, NIRPI, and Delfys on the video benchmarks and PySINDy on the chart-based simulations.

	## Supported systems

	\| Category \| Systems \|
	\|----------\|---------\|
	\| Delfys benchmark \| Pendulum, Torricelli drainage, Sliding block, LED decay, Free fall \|
	\| Real-world platforms \| Differential-drive rover (9 params), 6-DoF quadrotor (12 params) \|
	\| Simulation charts \| Lotka-Volterra, Chaotic Lorenz, F8 Crusader, HIV therapy, AID (Type-1 diabetes) \|

	## Installation

	Tested with Python 3.10+ on macOS and Linux.

	```bash
	git clone https://github.com/ImpactLabASU/EMMA-CVPR2026.git
	cd EMMA-CVPR2026
	python3 -m venv .venv && source .venv/bin/activate # optional but recommended
	pip install -r requirements.txt
	```

	System tools

	- [FFmpeg](https://ffmpeg.org/) on your `PATH` (MoviePy uses it for audio extraction): `brew install ffmpeg` (macOS) or `sudo apt install ffmpeg` (Ubuntu).
	- YOLO weights (default `yolo11m.pt`): `pip install ultralytics` then `yolo download model=yolo11m.pt`, or download from the Ultralytics releases page.
	- A CUDA GPU is optional; every script falls back to CPU automatically.

	## Repository layout

	\| Folder \| Purpose \| Entry points \|
	\| --- \| --- \| --- \|
	\| `Baseline/` \| Physics-informed EMMA pipelines (Free Fall, LED, Pendulum, Sliding Block, Torricelli) plus ablation utilities. \| `FreeFall/free_fall.py`, `LED/led.py`, `Pendulum/run-.py`, `Sliding block/sliding_block.py`, `Torricelli/toricelli*.py`, `architecture_ablation.py`, `run_additional_ablations.py` \|
	\| `Rover/` \| Rover perception, parameter estimation, multimodal ablations, helper shell script. \| `run.py`, `rover-ablation.py`, `rover_multimodal_ablation.py`, `run_rover_ablation.sh` \|
	\| `Drone/` \| Drone pipeline orchestrator (vision + audio + EMMA optimization). \| `new_run.py` \|
	\| `CGM/` \| Continuous glucose monitor chart digitizer. \| `extract_cgm_data.py` \|

	## Data

	- Baseline datasets come from the Delfys "Physical Parameter Prediction" set on Kaggle (https://www.kaggle.com/datasets/jaswar/physical-parameter-prediction). Download it and copy the experiment folders into `Baseline/`; the scripts discover the data automatically.
	- Sample rover and drone videos are available here: [Dropbox](https://www.dropbox.com/scl/fo/cjiym1h53puvv2ml6o8vn/APkfhTz64DnkYkHt554ZPj0?rlkey=hw3odtpzn6vl2nsfbe4pkekcq&dl=0). Place them under `Rover/` and `Drone/`.

	## Usage

	### Baseline pipelines

	Each baseline follows the same recipe:

	1. `cd Baseline/<Experiment>/`
	2. Edit the configuration block inside `main()`:
	- `video_path`: path to the source video; leave empty to reuse existing data files.
	- `weights_path`: YOLO weights (`yolo11m.pt` by default).
	- `pixel_to_meter` (Free Fall, Torricelli, Sliding Block): set from your calibration grid.
	- `output_folder`: a unique run directory (e.g. `run_01`); the script creates `output/` and `data/` under it.
	3. Run `python3 <script>.py`.
	4. Optional: `python3 <script>.py --simulation-only` skips retraining and reuses the latest `_coefficients.csv` and `_emma_final_model.pth` (Free Fall, LED, Pendulum).

	\| Experiment \| Script \| Key outputs \|
	\| --- \| --- \| --- \|
	\| Free Fall \| `FreeFall/free_fall.py` (`free_fall-m.py` for the medium set) \| trajectory CSV, `free_fall_coefficients.csv`, trained model, annotated video \|
	\| LED decay \| `LED/led.py` \| trajectory CSV, `led_coefficients.csv`, trained model, intensity figures \|
	\| Pendulum \| `Pendulum/run-45.py`, `run-90.py`, `run-150.py` \| `thetaData.txt`, `omegaData.txt`, `pendulum_coefficients.csv`, trained model \|
	\| Sliding block \| `Sliding block/sliding_block.py` (`-low`, `-med` variants) \| trajectory CSVs, `sliding_block_coefficients.csv`, trained model \|
	\| Torricelli \| `Torricelli/toricelli.py` (`toricelli-m.py`, `torricelli-sm.py`) \| height trajectories, `torricelli_coefficients.csv`, trained model \|

	PySINDy baselines. Each experiment folder has `pysindy_results/pysindy.py`; run it from that folder (after the main pipeline has written the EMMA-formatted CSVs) for sparse-regression baselines.

	Ablations. From `Baseline/`: `python3 architecture_ablation.py` and `python3 run_additional_ablations.py` (require pendulum datasets under `Baseline/Pendulum-EMMA/<angle>_v*/data/`).

	### Rover

	```bash
	cd Rover
	# set video_path and weights_path in run.py (see the CONFIGURATION SECTION)
	python3 run.py
	```

	Outputs: `rover_coefficients.csv`, `rover_EMMA_final_model.pth`, plots, GIF. Ablations: `python3 rover-ablation.py`, `python3 rover_multimodal_ablation.py`, or `bash run_rover_ablation.sh` (edit variables first). If you already have processed `data/*.txt`, set `video_path = ""` to skip detection.

	### Drone

	```bash
	cd Drone
	EMMA_RUN_ORCHESTRATOR=1 python3 new_run.py --video /path/to/DroneVideo.mp4 --weights /path/to/yolo11m.pt
	```

	> Note: Full orchestration also needs an external `Dronepipeline/` folder containing `droneExtract.py`, `droneExtractAudio.py`, and `EMMA_drone_torch_ltc_optimized.py`. These are not bundled here; without them, `new_run.py` falls back to the local vision-only pipeline.

	### CGM chart digitizer

	```bash
	cd CGM
	python3 extract_cgm_data.py # reads CGMData.png, writes cgm_data.txt + a visualization
	```

	## Troubleshooting

	- Module not found: re-run `pip install -r requirements.txt` in the active virtual environment. For `torch`/`torchvision`, use the [PyTorch selector](https://pytorch.org/get-started/locally/).
	- YOLO weights missing: download `yolo11m.pt` and point `weights_path` to it.
	- FFmpeg errors: install FFmpeg (`brew install ffmpeg` / `sudo apt install ffmpeg`).

	## Citation

	```bibtex
	@InProceedings{Shaikh_2026_CVPR,
	author = {Shaikh, Farhat and Banerjee, Ayan and Gupta, Sandeep},
	title = {EMMA: Extracting Multiple physical parameters from Multimodal Data},
	booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	month = {June},
	year = {2026},
	pages = {1716-1725}
	}
	```

	Also on [arXiv](https://arxiv.org/abs/2605.24047).