File size: 2,658 Bytes
12bc7a1
942ee04
 
 
bfc6728
942ee04
 
 
 
 
 
 
 
12bc7a1
942ee04
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
074257d
942ee04
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
language:
- en
- fr
pipeline_tag: video-classification
tags:
- human-activity-recognition
- multimodal
- sensor-fusion
- edge-ai
- privacy-preserving
- pytorch
- sparse-transformer
---
# SAM-MM-HAR

**SAM-MM-HAR** is a lightweight multimodal Human Activity Recognition model
built by **AMEFORGE Lab** (Amega Mike) on a proprietary sparse Transformer
architecture. It classifies 40 daily activities from privacy-preserving
non-RGB sensors: Depth, Skeleton, IMU, mmWave Radar, IR and Thermal.

Developed for the **CUHK-X Multimodal Human Activity Challenge**
(co-located with UbiComp 2026).

## Key specs

| Property | Value |
|---|---|
| Architecture | Sparse Transformer (proprietary — AMEFORGE) |
| Parameters | {n_params:,} (~{n_params/1e6:.1f}M) |
| Size on disk | {size:.1f} MB |
| Classes | 40 daily activities |
| Modalities | Depth · Skeleton · IMU · mmWave · IR · Thermal |
| Val accuracy | {val_acc:.1f}% (cross-subject) |
| Edge ready | ✅ CPU inference < 100 MB |

## Modalities

The model handles missing modalities gracefully — any subset works at inference.

| Modality | Encoder type |
|---|---|
| Depth | Patch Conv2D + sparse attention |
| IR / Thermal | Patch Conv2D + sparse attention |
| Skeleton | Joint linear + sparse attention |
| IMU (6-axis) | Conv1D temporal |
| mmWave Radar | Patch Conv2D + sparse attention |

A **MotionCore** temporal world-model (GRU over per-frame embeddings)
models human movement dynamics across frames — the key advantage over
standard frame-by-frame classifiers.

## Classes (40)

Wash_face · Brush_teeth · Wash_hands · Comb_hair · Put/Take_off_glasses ·
Put/Take_off_clothes · Put/Take_off_shoes · Drink_water · Eat · Read_book ·
Write · Use_phone · Use_laptop · Sit_down · Stand_up · Lie_down · Get_up ·
Walk · Run · Jump · Clap · Wave · Point · Throw · Kick · Pick_up ·
Put_down · Open/Close_door · Turn_on/off_light · Sweep_floor · Vacuum ·
Fall_down · Check_time · Take_body_temperature

## Inference

```python
import torch
from huggingface_hub import hf_hub_download

ckpt = hf_hub_download("AMFORGE/sam-mm-har", "best.pt")
# Load with inference.py from the repo
# python inference.py --checkpoint best.pt --clip /path/to/clip_folder
```

## Citation

If you use SAM-MM-HAR, please cite:

```bibtex
@misc{{sam_mm_har,
  title  = {{SAM-MM-HAR: Multimodal Human Activity Recognition
             on Privacy-Preserving Sensors}},
  author = {{AM},
  year   = {{2026}},
  note   = {{AMEFORGE Lab. Built on a proprietary sparse Transformer architecture.}},
}}
```

---
*Architecture internals are proprietary and not disclosed. © AMEFORGE Lab 2026*
"""