File size: 3,917 Bytes
adbfb5e f5247c7 adbfb5e f5247c7 adbfb5e f5247c7 8c039c2 f5247c7 843e196 f5247c7 8c039c2 f5247c7 8c039c2 f5247c7 8c039c2 843e196 8c039c2 843e196 8c039c2 d2f4227 f5247c7 8c039c2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
---
language:
- ko
license: apache-2.0
tags:
- video-classification
- driver-behavior-detection
- swin-transformer
- video-swin
- pytorch
datasets:
- custom
metrics:
- accuracy
- f1
pipeline_tag: video-classification
model-index:
- name: driver-behavior-swin3d-t
results:
- task:
type: video-classification
name: Video Classification
metrics:
- type: accuracy
value: 0.9805
name: Accuracy
- type: f1
value: 0.9757
name: Macro F1
---
# Driver Behavior Detection Model (Epoch 7)
운전자 이상행동 감지를 위한 Video Swin Transformer 기반 모델입니다.
## Model Description
- **Architecture**: Video Swin Transformer Tiny (swin3d_t)
- **Backbone Pretrained**: Kinetics-400
- **Parameters**: 27.85M
- **Input**: [B, 3, 30, 224, 224] (batch, channels, frames, height, width)
## Classes (5)
| Label | Class | F1-Score |
|:-----:|-------|:--------:|
| 0 | 정상 (Normal) | 0.97 |
| 1 | 졸음운전 (Drowsy Driving) | 0.99 |
| 2 | 물건찾기 (Reaching/Searching) | 0.96 |
| 3 | 휴대폰 사용 (Phone Usage) | 0.96 |
| 4 | 운전자 폭행 (Driver Assault) | 1.00 |
## Performance (Epoch 7)
| Metric | Value |
|--------|-------|
| **Accuracy** | 98.05% |
| **Macro F1** | 0.9757 |
| **Validation Samples** | 1,371,062 |
## Training Configuration
| Parameter | Value |
|-----------|-------|
| Hardware | 2x NVIDIA RTX A6000 (48GB) |
| Distributed | DDP (DistributedDataParallel) |
| Batch Size | 32 (16 x 2 GPU) |
| Gradient Accumulation | 4 |
| Effective Batch | 128 |
| Optimizer | AdamW (lr=1e-3, wd=0.05) |
| Scheduler | OneCycleLR |
| Mixed Precision | FP16 |
| Loss | CrossEntropy + Label Smoothing (0.1) |
| Regularization | Mixup (a=0.4), Dropout (0.3) |
## Files
| File | Size | Description |
|------|:----:|-------------|
| `pytorch_model.bin` | 121 MB | PyTorch weights (FP32) |
| `model.onnx` | 164 MB | ONNX model for mobile deployment |
| `config.json` | 1.2 KB | Model configuration |
| `model.py` | 6.9 KB | Model architecture code |
| `convert_coreml_macos.py` | 2.2 KB | CoreML conversion script (macOS) |
## Platform-specific Usage
### PyTorch (Server/Desktop)
```python
import torch
from model import DriverBehaviorModel
model = DriverBehaviorModel(num_classes=5, pretrained=False)
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
model.load_state_dict(checkpoint["model"])
model.eval()
```
### iOS (CoreML)
1. Copy `model.onnx` to macOS
2. Run conversion script:
```bash
python convert_coreml_macos.py
```
3. Add generated `DriverBehavior.mlpackage` to Xcode project
### Android (ONNX Runtime)
```kotlin
// build.gradle
implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'
// Kotlin
val session = OrtEnvironment.getEnvironment()
.createSession(assetManager.open("model.onnx").readBytes())
val output = session.run(mapOf("video_input" to inputTensor))
```
## Preprocessing (All Platforms)
```
Input Shape: [1, 3, 30, 224, 224] (batch, channels, frames, height, width)
Channel Order: RGB
Normalization: (pixel / 255.0 - mean) / std
- mean = [0.485, 0.456, 0.406]
- std = [0.229, 0.224, 0.225]
Resize: 224x224 (BILINEAR)
Frames: 30 frames uniformly sampled
```
## Dataset
- **Total Videos**: 243,979
- **Total Samples (windows)**: 1,371,062
- **Window Size**: 30 frames
- **Stride**: 15 frames
- **Resolution**: 224x224
## Training Progress
| Epoch | Accuracy | Macro F1 |
|:-----:|:--------:|:--------:|
| 5 | 97.35% | 0.9666 |
| 6 | 97.74% | 0.9720 |
| **7** | **98.05%** | **0.9757** |
## License
This model is for research purposes only.
## Citation
```
@misc{driver-behavior-detection-2026,
title={Driver Behavior Detection using Video Swin Transformer},
author={C-Team},
year={2026}
}
```
|