| --- |
| language: |
| - en |
| - fr |
| pipeline_tag: video-classification |
| tags: |
| - human-activity-recognition |
| - multimodal |
| - sensor-fusion |
| - edge-ai |
| - privacy-preserving |
| - pytorch |
| - sparse-transformer |
| --- |
| # SAM-MM-HAR |
|
|
| **SAM-MM-HAR** is a lightweight multimodal Human Activity Recognition model |
| built by **AMEFORGE Lab** (Amega Mike) on a proprietary sparse Transformer |
| architecture. It classifies 40 daily activities from privacy-preserving |
| non-RGB sensors: Depth, Skeleton, IMU, mmWave Radar, IR and Thermal. |
|
|
| Developed for the **CUHK-X Multimodal Human Activity Challenge** |
| (co-located with UbiComp 2026). |
|
|
| ## Key specs |
|
|
| | Property | Value | |
| |---|---| |
| | Architecture | Sparse Transformer (proprietary — AMEFORGE) | |
| | Parameters | {n_params:,} (~{n_params/1e6:.1f}M) | |
| | Size on disk | {size:.1f} MB | |
| | Classes | 40 daily activities | |
| | Modalities | Depth · Skeleton · IMU · mmWave · IR · Thermal | |
| | Val accuracy | {val_acc:.1f}% (cross-subject) | |
| | Edge ready | ✅ CPU inference < 100 MB | |
| |
| ## Modalities |
| |
| The model handles missing modalities gracefully — any subset works at inference. |
| |
| | Modality | Encoder type | |
| |---|---| |
| | Depth | Patch Conv2D + sparse attention | |
| | IR / Thermal | Patch Conv2D + sparse attention | |
| | Skeleton | Joint linear + sparse attention | |
| | IMU (6-axis) | Conv1D temporal | |
| | mmWave Radar | Patch Conv2D + sparse attention | |
| |
| A **MotionCore** temporal world-model (GRU over per-frame embeddings) |
| models human movement dynamics across frames — the key advantage over |
| standard frame-by-frame classifiers. |
| |
| ## Classes (40) |
| |
| Wash_face · Brush_teeth · Wash_hands · Comb_hair · Put/Take_off_glasses · |
| Put/Take_off_clothes · Put/Take_off_shoes · Drink_water · Eat · Read_book · |
| Write · Use_phone · Use_laptop · Sit_down · Stand_up · Lie_down · Get_up · |
| Walk · Run · Jump · Clap · Wave · Point · Throw · Kick · Pick_up · |
| Put_down · Open/Close_door · Turn_on/off_light · Sweep_floor · Vacuum · |
| Fall_down · Check_time · Take_body_temperature |
| |
| ## Inference |
| |
| ```python |
| import torch |
| from huggingface_hub import hf_hub_download |
|
|
| ckpt = hf_hub_download("AMFORGE/sam-mm-har", "best.pt") |
| # Load with inference.py from the repo |
| # python inference.py --checkpoint best.pt --clip /path/to/clip_folder |
| ``` |
| |
| ## Citation |
| |
| If you use SAM-MM-HAR, please cite: |
| |
| ```bibtex |
| @misc{{sam_mm_har, |
| title = {{SAM-MM-HAR: Multimodal Human Activity Recognition |
| on Privacy-Preserving Sensors}}, |
| author = {{AM}, |
| year = {{2026}}, |
| note = {{AMEFORGE Lab. Built on a proprietary sparse Transformer architecture.}}, |
| }} |
| ``` |
| |
| --- |
| *Architecture internals are proprietary and not disclosed. © AMEFORGE Lab 2026* |
| """ |