File size: 3,917 Bytes

---

language:
- ko
license: apache-2.0
tags:
- video-classification
- driver-behavior-detection
- swin-transformer
- video-swin
- pytorch
datasets:
- custom
metrics:
- accuracy
- f1
pipeline_tag: video-classification
model-index:
- name: driver-behavior-swin3d-t
  results:
  - task:
      type: video-classification
      name: Video Classification
    metrics:
    - type: accuracy
      value: 0.9805
      name: Accuracy
    - type: f1
      value: 0.9757
      name: Macro F1
---


# Driver Behavior Detection Model (Epoch 7)

운전자 이상행동 감지를 위한 Video Swin Transformer 기반 모델입니다.

## Model Description

- **Architecture**: Video Swin Transformer Tiny (swin3d_t)

- **Backbone Pretrained**: Kinetics-400

- **Parameters**: 27.85M

- **Input**: [B, 3, 30, 224, 224] (batch, channels, frames, height, width)



## Classes (5)



| Label | Class | F1-Score |

|:-----:|-------|:--------:|

| 0 | 정상 (Normal) | 0.97 |

| 1 | 졸음운전 (Drowsy Driving) | 0.99 |

| 2 | 물건찾기 (Reaching/Searching) | 0.96 |

| 3 | 휴대폰 사용 (Phone Usage) | 0.96 |

| 4 | 운전자 폭행 (Driver Assault) | 1.00 |



## Performance (Epoch 7)



| Metric | Value |

|--------|-------|

| **Accuracy** | 98.05% |

| **Macro F1** | 0.9757 |

| **Validation Samples** | 1,371,062 |



## Training Configuration



| Parameter | Value |

|-----------|-------|

| Hardware | 2x NVIDIA RTX A6000 (48GB) |

| Distributed | DDP (DistributedDataParallel) |

| Batch Size | 32 (16 x 2 GPU) |

| Gradient Accumulation | 4 |

| Effective Batch | 128 |

| Optimizer | AdamW (lr=1e-3, wd=0.05) |

| Scheduler | OneCycleLR |

| Mixed Precision | FP16 |

| Loss | CrossEntropy + Label Smoothing (0.1) |

| Regularization | Mixup (a=0.4), Dropout (0.3) |



## Files



| File | Size | Description |

|------|:----:|-------------|

| `pytorch_model.bin` | 121 MB | PyTorch weights (FP32) |
| `model.onnx` | 164 MB | ONNX model for mobile deployment |
| `config.json` | 1.2 KB | Model configuration |
| `model.py` | 6.9 KB | Model architecture code |
| `convert_coreml_macos.py` | 2.2 KB | CoreML conversion script (macOS) |

## Platform-specific Usage

### PyTorch (Server/Desktop)

```python

import torch

from model import DriverBehaviorModel



model = DriverBehaviorModel(num_classes=5, pretrained=False)

checkpoint = torch.load("pytorch_model.bin", map_location="cpu")

model.load_state_dict(checkpoint["model"])

model.eval()

```

### iOS (CoreML)

1. Copy `model.onnx` to macOS
2. Run conversion script:
```bash

python convert_coreml_macos.py

```
3. Add generated `DriverBehavior.mlpackage` to Xcode project

### Android (ONNX Runtime)

```kotlin

// build.gradle

implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'



// Kotlin

val session = OrtEnvironment.getEnvironment()

    .createSession(assetManager.open("model.onnx").readBytes())



val output = session.run(mapOf("video_input" to inputTensor))

```

## Preprocessing (All Platforms)

```

Input Shape: [1, 3, 30, 224, 224]  (batch, channels, frames, height, width)

Channel Order: RGB

Normalization: (pixel / 255.0 - mean) / std

  - mean = [0.485, 0.456, 0.406]

  - std = [0.229, 0.224, 0.225]

Resize: 224x224 (BILINEAR)

Frames: 30 frames uniformly sampled

```

## Dataset

- **Total Videos**: 243,979
- **Total Samples (windows)**: 1,371,062
- **Window Size**: 30 frames
- **Stride**: 15 frames
- **Resolution**: 224x224

## Training Progress

| Epoch | Accuracy | Macro F1 |
|:-----:|:--------:|:--------:|
| 5 | 97.35% | 0.9666 |
| 6 | 97.74% | 0.9720 |
| **7** | **98.05%** | **0.9757** |

## License

This model is for research purposes only.

## Citation

```

@misc{driver-behavior-detection-2026,

  title={Driver Behavior Detection using Video Swin Transformer},

  author={C-Team},

  year={2026}

}

```