AMFORGE
/

SAM-MM-HAR

Video Classification

human-activity-recognition

privacy-preserving

sparse-transformer

Model card Files Files and versions

SAM-MM-HAR / README.md

ameforge's picture

Update README.md

bfc6728 verified about 11 hours ago

|

History Blame Contribute Delete

2.66 kB

	---
	language:
	- en
	- fr
	pipeline_tag: video-classification
	tags:
	- human-activity-recognition
	- multimodal
	- sensor-fusion
	- edge-ai
	- privacy-preserving
	- pytorch
	- sparse-transformer
	---
	# SAM-MM-HAR

	SAM-MM-HAR is a lightweight multimodal Human Activity Recognition model
	built by AMEFORGE Lab (Amega Mike) on a proprietary sparse Transformer
	architecture. It classifies 40 daily activities from privacy-preserving
	non-RGB sensors: Depth, Skeleton, IMU, mmWave Radar, IR and Thermal.

	Developed for the CUHK-X Multimodal Human Activity Challenge
	(co-located with UbiComp 2026).

	## Key specs

	\| Property \| Value \|
	\|---\|---\|
	\| Architecture \| Sparse Transformer (proprietary — AMEFORGE) \|
	\| Parameters \| {n_params:,} (~{n_params/1e6:.1f}M) \|
	\| Size on disk \| {size:.1f} MB \|
	\| Classes \| 40 daily activities \|
	\| Modalities \| Depth · Skeleton · IMU · mmWave · IR · Thermal \|
	\| Val accuracy \| {val_acc:.1f}% (cross-subject) \|
	\| Edge ready \| ✅ CPU inference < 100 MB \|

	## Modalities

	The model handles missing modalities gracefully — any subset works at inference.

	\| Modality \| Encoder type \|
	\|---\|---\|
	\| Depth \| Patch Conv2D + sparse attention \|
	\| IR / Thermal \| Patch Conv2D + sparse attention \|
	\| Skeleton \| Joint linear + sparse attention \|
	\| IMU (6-axis) \| Conv1D temporal \|
	\| mmWave Radar \| Patch Conv2D + sparse attention \|

	A MotionCore temporal world-model (GRU over per-frame embeddings)
	models human movement dynamics across frames — the key advantage over
	standard frame-by-frame classifiers.

	## Classes (40)

	Wash_face · Brush_teeth · Wash_hands · Comb_hair · Put/Take_off_glasses ·
	Put/Take_off_clothes · Put/Take_off_shoes · Drink_water · Eat · Read_book ·
	Write · Use_phone · Use_laptop · Sit_down · Stand_up · Lie_down · Get_up ·
	Walk · Run · Jump · Clap · Wave · Point · Throw · Kick · Pick_up ·
	Put_down · Open/Close_door · Turn_on/off_light · Sweep_floor · Vacuum ·
	Fall_down · Check_time · Take_body_temperature

	## Inference

	```python
	import torch
	from huggingface_hub import hf_hub_download

	ckpt = hf_hub_download("AMFORGE/sam-mm-har", "best.pt")
	# Load with inference.py from the repo
	# python inference.py --checkpoint best.pt --clip /path/to/clip_folder
	```

	## Citation

	If you use SAM-MM-HAR, please cite:

	```bibtex
	@misc{{sam_mm_har,
	title = {{SAM-MM-HAR: Multimodal Human Activity Recognition
	on Privacy-Preserving Sensors}},
	author = {{AM},
	year = {{2026}},
	note = {{AMEFORGE Lab. Built on a proprietary sparse Transformer architecture.}},
	}}
	```

	---
	Architecture internals are proprietary and not disclosed. © AMEFORGE Lab 2026
	"""