File size: 3,917 Bytes
adbfb5e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f5247c7
adbfb5e
 
f5247c7
adbfb5e
 
 
f5247c7
8c039c2
 
 
 
 
 
 
 
 
 
 
 
 
 
f5247c7
843e196
f5247c7
 
8c039c2
 
f5247c7
8c039c2
 
 
f5247c7
 
8c039c2
 
 
 
 
 
 
 
843e196
8c039c2
 
 
 
 
 
843e196
8c039c2
d2f4227
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f5247c7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c039c2
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---

language:
- ko
license: apache-2.0
tags:
- video-classification
- driver-behavior-detection
- swin-transformer
- video-swin
- pytorch
datasets:
- custom
metrics:
- accuracy
- f1
pipeline_tag: video-classification
model-index:
- name: driver-behavior-swin3d-t
  results:
  - task:
      type: video-classification
      name: Video Classification
    metrics:
    - type: accuracy
      value: 0.9805
      name: Accuracy
    - type: f1
      value: 0.9757
      name: Macro F1
---


# Driver Behavior Detection Model (Epoch 7)

운전자 이상행동 감지를 위한 Video Swin Transformer 기반 모델입니다.

## Model Description

- **Architecture**: Video Swin Transformer Tiny (swin3d_t)

- **Backbone Pretrained**: Kinetics-400

- **Parameters**: 27.85M

- **Input**: [B, 3, 30, 224, 224] (batch, channels, frames, height, width)



## Classes (5)



| Label | Class | F1-Score |

|:-----:|-------|:--------:|

| 0 | 정상 (Normal) | 0.97 |

| 1 | 졸음운전 (Drowsy Driving) | 0.99 |

| 2 | 물건찾기 (Reaching/Searching) | 0.96 |

| 3 | 휴대폰 사용 (Phone Usage) | 0.96 |

| 4 | 운전자 폭행 (Driver Assault) | 1.00 |



## Performance (Epoch 7)



| Metric | Value |

|--------|-------|

| **Accuracy** | 98.05% |

| **Macro F1** | 0.9757 |

| **Validation Samples** | 1,371,062 |



## Training Configuration



| Parameter | Value |

|-----------|-------|

| Hardware | 2x NVIDIA RTX A6000 (48GB) |

| Distributed | DDP (DistributedDataParallel) |

| Batch Size | 32 (16 x 2 GPU) |

| Gradient Accumulation | 4 |

| Effective Batch | 128 |

| Optimizer | AdamW (lr=1e-3, wd=0.05) |

| Scheduler | OneCycleLR |

| Mixed Precision | FP16 |

| Loss | CrossEntropy + Label Smoothing (0.1) |

| Regularization | Mixup (a=0.4), Dropout (0.3) |



## Files



| File | Size | Description |

|------|:----:|-------------|

| `pytorch_model.bin` | 121 MB | PyTorch weights (FP32) |
| `model.onnx` | 164 MB | ONNX model for mobile deployment |
| `config.json` | 1.2 KB | Model configuration |
| `model.py` | 6.9 KB | Model architecture code |
| `convert_coreml_macos.py` | 2.2 KB | CoreML conversion script (macOS) |

## Platform-specific Usage

### PyTorch (Server/Desktop)

```python

import torch

from model import DriverBehaviorModel



model = DriverBehaviorModel(num_classes=5, pretrained=False)

checkpoint = torch.load("pytorch_model.bin", map_location="cpu")

model.load_state_dict(checkpoint["model"])

model.eval()

```

### iOS (CoreML)

1. Copy `model.onnx` to macOS
2. Run conversion script:
```bash

python convert_coreml_macos.py

```
3. Add generated `DriverBehavior.mlpackage` to Xcode project

### Android (ONNX Runtime)

```kotlin

// build.gradle

implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'



// Kotlin

val session = OrtEnvironment.getEnvironment()

    .createSession(assetManager.open("model.onnx").readBytes())



val output = session.run(mapOf("video_input" to inputTensor))

```

## Preprocessing (All Platforms)

```

Input Shape: [1, 3, 30, 224, 224]  (batch, channels, frames, height, width)

Channel Order: RGB

Normalization: (pixel / 255.0 - mean) / std

  - mean = [0.485, 0.456, 0.406]

  - std = [0.229, 0.224, 0.225]

Resize: 224x224 (BILINEAR)

Frames: 30 frames uniformly sampled

```

## Dataset

- **Total Videos**: 243,979
- **Total Samples (windows)**: 1,371,062
- **Window Size**: 30 frames
- **Stride**: 15 frames
- **Resolution**: 224x224

## Training Progress

| Epoch | Accuracy | Macro F1 |
|:-----:|:--------:|:--------:|
| 5 | 97.35% | 0.9666 |
| 6 | 97.74% | 0.9720 |
| **7** | **98.05%** | **0.9757** |

## License

This model is for research purposes only.

## Citation

```

@misc{driver-behavior-detection-2026,

  title={Driver Behavior Detection using Video Swin Transformer},

  author={C-Team},

  year={2026}

}

```