|
|
---
|
|
|
language:
|
|
|
- ko
|
|
|
license: apache-2.0
|
|
|
tags:
|
|
|
- video-classification
|
|
|
- driver-behavior-detection
|
|
|
- swin-transformer
|
|
|
- video-swin
|
|
|
- pytorch
|
|
|
datasets:
|
|
|
- custom
|
|
|
metrics:
|
|
|
- accuracy
|
|
|
- f1
|
|
|
pipeline_tag: video-classification
|
|
|
model-index:
|
|
|
- name: driver-behavior-swin3d-t
|
|
|
results:
|
|
|
- task:
|
|
|
type: video-classification
|
|
|
name: Video Classification
|
|
|
metrics:
|
|
|
- type: accuracy
|
|
|
value: 0.9805
|
|
|
name: Accuracy
|
|
|
- type: f1
|
|
|
value: 0.9757
|
|
|
name: Macro F1
|
|
|
---
|
|
|
|
|
|
# Driver Behavior Detection Model (Epoch 7)
|
|
|
|
|
|
운전자 이상행동 감지를 위한 Video Swin Transformer 기반 모델입니다.
|
|
|
|
|
|
## Model Description
|
|
|
|
|
|
- **Architecture**: Video Swin Transformer Tiny (swin3d_t)
|
|
|
- **Backbone Pretrained**: Kinetics-400
|
|
|
- **Parameters**: 27.85M
|
|
|
- **Input**: [B, 3, 30, 224, 224] (batch, channels, frames, height, width)
|
|
|
|
|
|
## Classes (5)
|
|
|
|
|
|
| Label | Class | F1-Score |
|
|
|
|:-----:|-------|:--------:|
|
|
|
| 0 | 정상 (Normal) | 0.97 |
|
|
|
| 1 | 졸음운전 (Drowsy Driving) | 0.99 |
|
|
|
| 2 | 물건찾기 (Reaching/Searching) | 0.96 |
|
|
|
| 3 | 휴대폰 사용 (Phone Usage) | 0.96 |
|
|
|
| 4 | 운전자 폭행 (Driver Assault) | 1.00 |
|
|
|
|
|
|
## Performance (Epoch 7)
|
|
|
|
|
|
| Metric | Value |
|
|
|
|--------|-------|
|
|
|
| **Accuracy** | 98.05% |
|
|
|
| **Macro F1** | 0.9757 |
|
|
|
| **Validation Samples** | 1,371,062 |
|
|
|
|
|
|
## Training Configuration
|
|
|
|
|
|
| Parameter | Value |
|
|
|
|-----------|-------|
|
|
|
| Hardware | 2x NVIDIA RTX A6000 (48GB) |
|
|
|
| Distributed | DDP (DistributedDataParallel) |
|
|
|
| Batch Size | 32 (16 x 2 GPU) |
|
|
|
| Gradient Accumulation | 4 |
|
|
|
| Effective Batch | 128 |
|
|
|
| Optimizer | AdamW (lr=1e-3, wd=0.05) |
|
|
|
| Scheduler | OneCycleLR |
|
|
|
| Mixed Precision | FP16 |
|
|
|
| Loss | CrossEntropy + Label Smoothing (0.1) |
|
|
|
| Regularization | Mixup (a=0.4), Dropout (0.3) |
|
|
|
|
|
|
## Files
|
|
|
|
|
|
| File | Size | Description |
|
|
|
|------|:----:|-------------|
|
|
|
| `pytorch_model.bin` | 121 MB | PyTorch weights (FP32) |
|
|
|
| `model.onnx` | 164 MB | ONNX model for mobile deployment |
|
|
|
| `config.json` | 1.2 KB | Model configuration |
|
|
|
| `model.py` | 6.9 KB | Model architecture code |
|
|
|
| `convert_coreml_macos.py` | 2.2 KB | CoreML conversion script (macOS) |
|
|
|
|
|
|
## Platform-specific Usage
|
|
|
|
|
|
### PyTorch (Server/Desktop)
|
|
|
|
|
|
```python
|
|
|
import torch
|
|
|
from model import DriverBehaviorModel
|
|
|
|
|
|
model = DriverBehaviorModel(num_classes=5, pretrained=False)
|
|
|
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
|
|
|
model.load_state_dict(checkpoint["model"])
|
|
|
model.eval()
|
|
|
```
|
|
|
|
|
|
### iOS (CoreML)
|
|
|
|
|
|
1. Copy `model.onnx` to macOS
|
|
|
2. Run conversion script:
|
|
|
```bash
|
|
|
python convert_coreml_macos.py
|
|
|
```
|
|
|
3. Add generated `DriverBehavior.mlpackage` to Xcode project
|
|
|
|
|
|
### Android (ONNX Runtime)
|
|
|
|
|
|
```kotlin
|
|
|
// build.gradle
|
|
|
implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'
|
|
|
|
|
|
// Kotlin
|
|
|
val session = OrtEnvironment.getEnvironment()
|
|
|
.createSession(assetManager.open("model.onnx").readBytes())
|
|
|
|
|
|
val output = session.run(mapOf("video_input" to inputTensor))
|
|
|
```
|
|
|
|
|
|
## Preprocessing (All Platforms)
|
|
|
|
|
|
```
|
|
|
Input Shape: [1, 3, 30, 224, 224] (batch, channels, frames, height, width)
|
|
|
Channel Order: RGB
|
|
|
Normalization: (pixel / 255.0 - mean) / std
|
|
|
- mean = [0.485, 0.456, 0.406]
|
|
|
- std = [0.229, 0.224, 0.225]
|
|
|
Resize: 224x224 (BILINEAR)
|
|
|
Frames: 30 frames uniformly sampled
|
|
|
```
|
|
|
|
|
|
## Dataset
|
|
|
|
|
|
- **Total Videos**: 243,979
|
|
|
- **Total Samples (windows)**: 1,371,062
|
|
|
- **Window Size**: 30 frames
|
|
|
- **Stride**: 15 frames
|
|
|
- **Resolution**: 224x224
|
|
|
|
|
|
## Training Progress
|
|
|
|
|
|
| Epoch | Accuracy | Macro F1 |
|
|
|
|:-----:|:--------:|:--------:|
|
|
|
| 5 | 97.35% | 0.9666 |
|
|
|
| 6 | 97.74% | 0.9720 |
|
|
|
| **7** | **98.05%** | **0.9757** |
|
|
|
|
|
|
## License
|
|
|
|
|
|
This model is for research purposes only.
|
|
|
|
|
|
## Citation
|
|
|
|
|
|
```
|
|
|
@misc{driver-behavior-detection-2026,
|
|
|
title={Driver Behavior Detection using Video Swin Transformer},
|
|
|
author={C-Team},
|
|
|
year={2026}
|
|
|
}
|
|
|
```
|
|
|
|