PedSense -- KeypointLSTM Pedestrian Crossing Intent Classifier (JAAD, 50 epochs)
Overview
A lightweight LSTM classifier that predicts pedestrian crossing intent from normalized skeleton sequences. Each input is a sequence of T=5 frames where each frame contains 17 COCO body keypoints (x, y) relative to the pedestrian's bounding box center and height.
Keypoint sequences were extracted from JAAD dashcam frames using a fine-tuned YOLO26m-Pose model. Sequences are anchored to each pedestrian's crossing_point annotation with a 1-second prediction horizon โ the model predicts intent at least 1 second before the crossing event occurs.
Model Details
| Property | Value |
|---|---|
| Architecture | KeypointLSTM (Linear projection + 2-layer LSTM + classifier) |
| Task | Binary classification (not-crossing / crossing) |
| Input | (T=5, 34) โ 5 frames ร 17 keypoints ร 2 coordinates (flattened) |
| Dataset | JAAD |
| Epochs | 50 |
| Batch Size | 16 |
| Hidden Size | 128 |
| LSTM Layers | 2 |
| Dropout | 0.3 |
| Learning Rate | 1e-3 (AdamW + CosineAnnealingLR) |
| Train/Val Split | 80/20 (video-level) |
| Prediction Horizon | 1 second before crossing |
Performance (Best checkpoint by validation F1)
| Metric | Value |
|---|---|
| Best Val F1 (crossing) | 0.476 |
Keypoint Normalization
Each joint (kx, ky) is normalized relative to the pedestrian bounding box:
kx_norm = (kx - cx) / h
ky_norm = (ky - cy) / h
where cx, cy is the bounding box center and h is the bounding box height. This makes sequences view- and scale-invariant.
COCO Keypoint Order (17 points)
| Index | Keypoint |
|---|---|
| 0 | Nose |
| 1 | Left eye |
| 2 | Right eye |
| 3 | Left ear |
| 4 | Right ear |
| 5 | Left shoulder |
| 6 | Right shoulder |
| 7 | Left elbow |
| 8 | Right elbow |
| 9 | Left wrist |
| 10 | Right wrist |
| 11 | Left hip |
| 12 | Right hip |
| 13 | Left knee |
| 14 | Right knee |
| 15 | Left ankle |
| 16 | Right ankle |
Usage
import torch
import numpy as np
from huggingface_hub import hf_hub_download
# Download weights and config
weights_path = hf_hub_download(repo_id="JcProg/pedsense-keypoint-lstm-jaad-50e", filename="best.pt")
# Load model
from pedsense.train.resnet_lstm import KeypointLSTM
model = KeypointLSTM(input_size=34, hidden_size=128, num_layers=2, dropout=0.3)
model.load_state_dict(torch.load(weights_path, map_location="cpu", weights_only=True))
model.eval()
# Run inference on a (T=5, 34) sequence
seq = torch.from_numpy(keypoints_normalized).float().unsqueeze(0) # (1, 5, 34)
with torch.no_grad():
probs = torch.softmax(model(seq), dim=1)[0]
label = "crossing" if probs[1] > 0.5 else "not-crossing"
Training
Sequences were built with preprocess keypoints, then the model was trained using the PedSense-AI framework:
# Step 1: Extract frames
uv run pedsense preprocess frames
# Step 2: Build keypoint sequences (YOLO-Pose โ JAAD alignment โ (T,17,2) arrays)
uv run pedsense preprocess keypoints --sequence-length 5 --prediction-horizon 1.0
# Step 3: Train KeypointLSTM
uv run pedsense train -m keypoint-lstm -n my_lstm -e 50 -b 16
License
AGPL-3.0
Related Models
- pedsense-yolo26m-pose-jaad-10e -- YOLO26m-Pose keypoint detector used for sequence extraction
- pedsense-yolo11m-pose-jaad-10e -- YOLO11m-Pose keypoint model
- pedsense-yolo26m-detector-jaad-20e -- 1-class pedestrian detector (20 epochs)
- pedsense-yolo26m-detector-jaad-aug-10e -- augmented detector (10 epochs)
- pedsense-yolo26-jaad-10e -- 2-class crossing intent classifier
- Downloads last month
- 49