koreashin
/

Driver_monitoring

Video Classification

video-swin-transformer

driver-behavior-detection

swin-transformer

Eval Results (legacy)

Model card Files Files and versions

Driver_monitoring / README.md

koreashin's picture

Upload 4 files

8c039c2 verified about 1 month ago

|

2.38 kB

	# Driver Behavior Detection Model (Epoch 2)

	운전자 이상행동 감지를 위한 Video Swin Transformer 기반 모델입니다.

	## Model Description

	- Architecture: Video Swin Transformer Tiny (swin3d_t)
	- Backbone Pretrained: Kinetics-400
	- Parameters: 27.85M
	- Input: [B, 3, 30, 224, 224] (batch, channels, frames, height, width)

	## Classes (5)

	\| Label \| Class \| F1-Score \|
	\|:-----:\|-------\|:--------:\|
	\| 0 \| 정상 (Normal) \| 0.93 \|
	\| 1 \| 졸음운전 (Drowsy Driving) \| 0.98 \|
	\| 2 \| 물건찾기 (Reaching/Searching) \| 0.90 \|
	\| 3 \| 휴대폰 사용 (Phone Usage) \| 0.88 \|
	\| 4 \| 운전자 폭행 (Driver Assault) \| 1.00 \|

	## Performance (Epoch 2)

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Accuracy \| 95.15% \|
	\| Macro F1 \| 0.9392 \|
	\| Validation Samples \| 1,371,062 \|

	## Training Configuration

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Hardware \| 2x NVIDIA RTX A6000 (48GB) \|
	\| Distributed \| DDP (DistributedDataParallel) \|
	\| Batch Size \| 32 (16 × 2 GPU) \|
	\| Gradient Accumulation \| 4 \|
	\| Effective Batch \| 128 \|
	\| Optimizer \| AdamW (lr=1e-3, wd=0.05) \|
	\| Scheduler \| OneCycleLR \|
	\| Mixed Precision \| FP16 \|
	\| Loss \| CrossEntropy + Label Smoothing (0.1) \|
	\| Regularization \| Mixup (α=0.4), Dropout (0.3) \|

	## Usage

	```python
	import torch
	from model import DriverBehaviorModel

	# Load model
	model = DriverBehaviorModel(num_classes=5, pretrained=False)
	checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
	model.load_state_dict(checkpoint["model"])
	model.eval()

	# Inference
	# input: [1, 3, 30, 224, 224] - 30 frames, 224x224, RGB normalized
	with torch.no_grad():
	output = model(video_tensor)
	prediction = output.argmax(dim=1)
	```

	## Dataset

	- Total Videos: 243,979
	- Total Samples (windows): 1,371,062
	- Window Size: 30 frames
	- Stride: 15 frames
	- Resolution: 224×224

	## Augmentation (Training)

	- RandomResizedCrop (scale 0.8-1.0)
	- HorizontalFlip (p=0.5)
	- ColorJitter, HueSaturationValue
	- Temporal Augmentation (speed change, frame drop)
	- Mixup (α=0.4)
	- CoarseDropout

	## License

	This model is for research purposes only.

	## Citation

	```
	@misc{driver-behavior-detection-2026,
	title={Driver Behavior Detection using Video Swin Transformer},
	author={C-Team},
	year={2026}
	}
	```