Spaces:

ASU-IMPACT-Lab
/

EMMA-CVPR2026

Configuration error

EMMA-CVPR2026

File size: 9,542 Bytes



# EMMA: Extracting Multiple physical parameters from Multimodal Data

**CVPR 2026**

[Farhat Shaikh](https://scholar.google.com/citations?hl=en&user=mbAOSW0AAAAJ), [Ayan Banerjee](https://scholar.google.com/citations?user=UAlc7tEAAAAJ&hl=en), [Sandeep K. S. Gupta](https://scholar.google.com/citations?user=U9bcQkMAAAAJ&hl=en)

**IMPACT Lab, School of Computing & Augmented Intelligence (SCAI), Arizona State University**

[**Project page**](https://impactlabasu.github.io/EMMA-CVPR2026/) · [**Demo video**](https://youtu.be/Uo79pVlM6Rk)

---

## Overview

EMMA is a physics-informed multimodal framework that recovers all identifiable dynamical parameters of a system directly from raw video, audio, and image-based time-series observations. Unlike prior video-only approaches that struggle with occluded states, hidden actuation inputs, and assumptions about known initial conditions, EMMA performs joint inference of **explicit parameters**, **implicit dynamical components**, and **calibration invariants** within a unified continuous-time model.

The user supplies the parametric structure of the governing ODE; EMMA solves the inverse problem of recovering its parameters, along with any latent forcing and invariants, from multimodal observations.

## Key contributions

- **Multi-modal dynamical parameter extraction** from video, audio, and time-series reconstructed from visual charts.
- **Recovery under unobserved forcing inputs** by inferring latent actuation (e.g. wheel speed) from audio.
- **Estimation of implicit dynamics** associated with unmeasured physical effects (e.g. frictional drag).
- **Invariant calibration from raw video**, eliminating assumptions about known initial conditions or coordinate frames.
- **Extensive validation** on 100+ scenarios: Delfys benchmark (75 videos), real-world rover and quadrotor, and simulation charts.

## Architecture

<p align="center"><img src="docs/EMMA-arc.png" alt="EMMA architecture" width="780" /></p>

EMMA follows a three-step pipeline: **Sense · Learn · Verify**.

1. **Sense.** Video, audio, and chart images are converted into time-aligned signals through modality-specific pipelines.
2. **Learn.** A Liquid Time-Constant (LTC) network models the system's latent dynamics in continuous time.
3. **Verify.** A differentiable ODE solver simulates the recovered parameters and checks them against the observations under a physics-informed loss.

## Results

EMMA delivers accurate multi-parameter recovery across diverse physical systems. Full tables and ablations are in the [paper](docs/42612.pdf).

| System | Parameters recovered | EMMA error | Best baseline |
|--------|----------------------|------------|---------------|
| Pendulum (90 cm) | Length *L*, damping *τ* | **L = 0.86 ± 0.07 m** (GT 0.90) | Delfys, PySINDy |
| Torricelli (med.) | Drainage *k* | **0.0132 ± 0.0008** (GT 0.0128) | matches Delfys |
| Sliding block (med.) | Angle *α*, friction *μ* | **α = 24.72°, μ = 0.205** (GT 25°, 0.20) | Delfys, PySINDy |
| LED decay (med.) | γ | **0.91 ± 0.0** (GT 0.92) | matches Delfys |
| Rover | 9 params (5 with known ground truth) | **8.8 % ± 1.7 %** mean error | *first work under hidden forcing* |
| Quadrotor | 12 params (7 with known ground truth) | **15.9 % ± 7.4 %** mean error | *first work under hidden forcing* |
| Simulation charts | Lotka-Volterra, Lorenz, F8 Crusader, HIV, AID | **>10× lower error** than PySINDy on implicit dynamics | PySINDy |

Compared against **PAIG**, **NIRPI**, and **Delfys** on the video benchmarks and **PySINDy** on the chart-based simulations.

## Supported systems

| Category | Systems |
|----------|---------|
| Delfys benchmark | Pendulum, Torricelli drainage, Sliding block, LED decay, Free fall |
| Real-world platforms | Differential-drive rover (9 params), 6-DoF quadrotor (12 params) |
| Simulation charts | Lotka-Volterra, Chaotic Lorenz, F8 Crusader, HIV therapy, AID (Type-1 diabetes) |

## Installation

Tested with **Python 3.10+** on macOS and Linux.

```bash
git clone https://github.com/ImpactLabASU/EMMA-CVPR2026.git
cd EMMA-CVPR2026
python3 -m venv .venv && source .venv/bin/activate   # optional but recommended
pip install -r requirements.txt
```

**System tools**

- [FFmpeg](https://ffmpeg.org/) on your `PATH` (MoviePy uses it for audio extraction): `brew install ffmpeg` (macOS) or `sudo apt install ffmpeg` (Ubuntu).
- YOLO weights (default `yolo11m.pt`): `pip install ultralytics` then `yolo download model=yolo11m.pt`, or download from the Ultralytics releases page.
- A CUDA GPU is optional; every script falls back to CPU automatically.

## Repository layout

| Folder | Purpose | Entry points |
| --- | --- | --- |
| `Baseline/` | Physics-informed EMMA pipelines (Free Fall, LED, Pendulum, Sliding Block, Torricelli) plus ablation utilities. | `FreeFall/free_fall.py`, `LED/led.py`, `Pendulum/run-*.py`, `Sliding block/sliding_block*.py`, `Torricelli/toricelli*.py`, `architecture_ablation.py`, `run_additional_ablations.py` |
| `Rover/` | Rover perception, parameter estimation, multimodal ablations, helper shell script. | `run.py`, `rover-ablation.py`, `rover_multimodal_ablation.py`, `run_rover_ablation.sh` |
| `Drone/` | Drone pipeline orchestrator (vision + audio + EMMA optimization). | `new_run.py` |
| `CGM/` | Continuous glucose monitor chart digitizer. | `extract_cgm_data.py` |

## Data

- **Baseline datasets** come from the Delfys "Physical Parameter Prediction" set on Kaggle (https://www.kaggle.com/datasets/jaswar/physical-parameter-prediction). Download it and copy the experiment folders into `Baseline/`; the scripts discover the data automatically.
- **Sample rover and drone videos** are available here: **[Dropbox](https://www.dropbox.com/scl/fo/cjiym1h53puvv2ml6o8vn/APkfhTz64DnkYkHt554ZPj0?rlkey=hw3odtpzn6vl2nsfbe4pkekcq&dl=0)**. Place them under `Rover/` and `Drone/`.

## Usage

### Baseline pipelines

Each baseline follows the same recipe:

1. `cd Baseline/<Experiment>/`
2. Edit the configuration block inside `main()`:
   - `video_path`: path to the source video; leave empty to reuse existing data files.
   - `weights_path`: YOLO weights (`yolo11m.pt` by default).
   - `pixel_to_meter` (Free Fall, Torricelli, Sliding Block): set from your calibration grid.
   - `output_folder`: a unique run directory (e.g. `run_01`); the script creates `output/` and `data/` under it.
3. Run `python3 <script>.py`.
4. Optional: `python3 <script>.py --simulation-only` skips retraining and reuses the latest `*_coefficients.csv` and `*_emma_final_model.pth` (Free Fall, LED, Pendulum).

| Experiment | Script | Key outputs |
| --- | --- | --- |
| Free Fall | `FreeFall/free_fall.py` (`free_fall-m.py` for the medium set) | trajectory CSV, `free_fall_coefficients.csv`, trained model, annotated video |
| LED decay | `LED/led.py` | trajectory CSV, `led_coefficients.csv`, trained model, intensity figures |
| Pendulum | `Pendulum/run-45.py`, `run-90.py`, `run-150.py` | `thetaData.txt`, `omegaData.txt`, `pendulum_coefficients.csv`, trained model |
| Sliding block | `Sliding block/sliding_block.py` (`-low`, `-med` variants) | trajectory CSVs, `sliding_block_coefficients.csv`, trained model |
| Torricelli | `Torricelli/toricelli.py` (`toricelli-m.py`, `torricelli-sm.py`) | height trajectories, `torricelli_coefficients.csv`, trained model |

**PySINDy baselines.** Each experiment folder has `pysindy_results/pysindy.py`; run it from that folder (after the main pipeline has written the EMMA-formatted CSVs) for sparse-regression baselines.

**Ablations.** From `Baseline/`: `python3 architecture_ablation.py` and `python3 run_additional_ablations.py` (require pendulum datasets under `Baseline/Pendulum-EMMA/<angle>_v*/data/`).

### Rover

```bash
cd Rover
# set video_path and weights_path in run.py (see the CONFIGURATION SECTION)
python3 run.py
```

Outputs: `rover_coefficients.csv`, `rover_EMMA_final_model.pth`, plots, GIF. Ablations: `python3 rover-ablation.py`, `python3 rover_multimodal_ablation.py`, or `bash run_rover_ablation.sh` (edit variables first). If you already have processed `data/*.txt`, set `video_path = ""` to skip detection.

### Drone

```bash
cd Drone
EMMA_RUN_ORCHESTRATOR=1 python3 new_run.py --video /path/to/DroneVideo.mp4 --weights /path/to/yolo11m.pt
```

> **Note:** Full orchestration also needs an external `Dronepipeline/` folder containing `droneExtract.py`, `droneExtractAudio.py`, and `EMMA_drone_torch_ltc_optimized.py`. These are not bundled here; without them, `new_run.py` falls back to the local vision-only pipeline.

### CGM chart digitizer

```bash
cd CGM
python3 extract_cgm_data.py   # reads CGMData.png, writes cgm_data.txt + a visualization
```

## Troubleshooting

- **Module not found:** re-run `pip install -r requirements.txt` in the active virtual environment. For `torch`/`torchvision`, use the [PyTorch selector](https://pytorch.org/get-started/locally/).
- **YOLO weights missing:** download `yolo11m.pt` and point `weights_path` to it.
- **FFmpeg errors:** install FFmpeg (`brew install ffmpeg` / `sudo apt install ffmpeg`).

## Citation

```bibtex
@InProceedings{Shaikh_2026_CVPR,
    author    = {Shaikh, Farhat and Banerjee, Ayan and Gupta, Sandeep},
    title     = {EMMA: Extracting Multiple physical parameters from Multimodal Data},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {1716-1725}
}
```

Also on [arXiv](https://arxiv.org/abs/2605.24047).