Driver_monitoring / README.md
koreashin's picture
Upload 6 files
f5247c7 verified
---
language:
- ko
license: apache-2.0
tags:
- video-classification
- driver-behavior-detection
- swin-transformer
- video-swin
- pytorch
datasets:
- custom
metrics:
- accuracy
- f1
pipeline_tag: video-classification
model-index:
- name: driver-behavior-swin3d-t
results:
- task:
type: video-classification
name: Video Classification
metrics:
- type: accuracy
value: 0.9805
name: Accuracy
- type: f1
value: 0.9757
name: Macro F1
---
# Driver Behavior Detection Model (Epoch 7)
운전자 이상행동 감지를 위한 Video Swin Transformer 기반 모델입니다.
## Model Description
- **Architecture**: Video Swin Transformer Tiny (swin3d_t)
- **Backbone Pretrained**: Kinetics-400
- **Parameters**: 27.85M
- **Input**: [B, 3, 30, 224, 224] (batch, channels, frames, height, width)
## Classes (5)
| Label | Class | F1-Score |
|:-----:|-------|:--------:|
| 0 | 정상 (Normal) | 0.97 |
| 1 | 졸음운전 (Drowsy Driving) | 0.99 |
| 2 | 물건찾기 (Reaching/Searching) | 0.96 |
| 3 | 휴대폰 사용 (Phone Usage) | 0.96 |
| 4 | 운전자 폭행 (Driver Assault) | 1.00 |
## Performance (Epoch 7)
| Metric | Value |
|--------|-------|
| **Accuracy** | 98.05% |
| **Macro F1** | 0.9757 |
| **Validation Samples** | 1,371,062 |
## Training Configuration
| Parameter | Value |
|-----------|-------|
| Hardware | 2x NVIDIA RTX A6000 (48GB) |
| Distributed | DDP (DistributedDataParallel) |
| Batch Size | 32 (16 x 2 GPU) |
| Gradient Accumulation | 4 |
| Effective Batch | 128 |
| Optimizer | AdamW (lr=1e-3, wd=0.05) |
| Scheduler | OneCycleLR |
| Mixed Precision | FP16 |
| Loss | CrossEntropy + Label Smoothing (0.1) |
| Regularization | Mixup (a=0.4), Dropout (0.3) |
## Files
| File | Size | Description |
|------|:----:|-------------|
| `pytorch_model.bin` | 121 MB | PyTorch weights (FP32) |
| `model.onnx` | 164 MB | ONNX model for mobile deployment |
| `config.json` | 1.2 KB | Model configuration |
| `model.py` | 6.9 KB | Model architecture code |
| `convert_coreml_macos.py` | 2.2 KB | CoreML conversion script (macOS) |
## Platform-specific Usage
### PyTorch (Server/Desktop)
```python
import torch
from model import DriverBehaviorModel
model = DriverBehaviorModel(num_classes=5, pretrained=False)
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
model.load_state_dict(checkpoint["model"])
model.eval()
```
### iOS (CoreML)
1. Copy `model.onnx` to macOS
2. Run conversion script:
```bash
python convert_coreml_macos.py
```
3. Add generated `DriverBehavior.mlpackage` to Xcode project
### Android (ONNX Runtime)
```kotlin
// build.gradle
implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'
// Kotlin
val session = OrtEnvironment.getEnvironment()
.createSession(assetManager.open("model.onnx").readBytes())
val output = session.run(mapOf("video_input" to inputTensor))
```
## Preprocessing (All Platforms)
```
Input Shape: [1, 3, 30, 224, 224] (batch, channels, frames, height, width)
Channel Order: RGB
Normalization: (pixel / 255.0 - mean) / std
- mean = [0.485, 0.456, 0.406]
- std = [0.229, 0.224, 0.225]
Resize: 224x224 (BILINEAR)
Frames: 30 frames uniformly sampled
```
## Dataset
- **Total Videos**: 243,979
- **Total Samples (windows)**: 1,371,062
- **Window Size**: 30 frames
- **Stride**: 15 frames
- **Resolution**: 224x224
## Training Progress
| Epoch | Accuracy | Macro F1 |
|:-----:|:--------:|:--------:|
| 5 | 97.35% | 0.9666 |
| 6 | 97.74% | 0.9720 |
| **7** | **98.05%** | **0.9757** |
## License
This model is for research purposes only.
## Citation
```
@misc{driver-behavior-detection-2026,
title={Driver Behavior Detection using Video Swin Transformer},
author={C-Team},
year={2026}
}
```