File size: 2,375 Bytes
8c039c2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# Driver Behavior Detection Model (Epoch 2)

운전자 이상행동 감지를 위한 Video Swin Transformer 기반 모델입니다.

## Model Description

- **Architecture**: Video Swin Transformer Tiny (swin3d_t)

- **Backbone Pretrained**: Kinetics-400

- **Parameters**: 27.85M

- **Input**: [B, 3, 30, 224, 224] (batch, channels, frames, height, width)



## Classes (5)



| Label | Class | F1-Score |

|:-----:|-------|:--------:|

| 0 | 정상 (Normal) | 0.93 |

| 1 | 졸음운전 (Drowsy Driving) | 0.98 |

| 2 | 물건찾기 (Reaching/Searching) | 0.90 |

| 3 | 휴대폰 사용 (Phone Usage) | 0.88 |

| 4 | 운전자 폭행 (Driver Assault) | 1.00 |



## Performance (Epoch 2)



| Metric | Value |

|--------|-------|

| **Accuracy** | 95.15% |

| **Macro F1** | 0.9392 |

| **Validation Samples** | 1,371,062 |



## Training Configuration



| Parameter | Value |

|-----------|-------|

| Hardware | 2x NVIDIA RTX A6000 (48GB) |

| Distributed | DDP (DistributedDataParallel) |

| Batch Size | 32 (16 × 2 GPU) |

| Gradient Accumulation | 4 |

| Effective Batch | 128 |

| Optimizer | AdamW (lr=1e-3, wd=0.05) |

| Scheduler | OneCycleLR |

| Mixed Precision | FP16 |

| Loss | CrossEntropy + Label Smoothing (0.1) |

| Regularization | Mixup (α=0.4), Dropout (0.3) |



## Usage



```python

import torch

from model import DriverBehaviorModel



# Load model

model = DriverBehaviorModel(num_classes=5, pretrained=False)
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
model.load_state_dict(checkpoint["model"])
model.eval()

# Inference
# input: [1, 3, 30, 224, 224] - 30 frames, 224x224, RGB normalized
with torch.no_grad():

    output = model(video_tensor)
    prediction = output.argmax(dim=1)

```


## Dataset

- **Total Videos**: 243,979
- **Total Samples (windows)**: 1,371,062
- **Window Size**: 30 frames
- **Stride**: 15 frames
- **Resolution**: 224×224

## Augmentation (Training)

- RandomResizedCrop (scale 0.8-1.0)
- HorizontalFlip (p=0.5)
- ColorJitter, HueSaturationValue
- Temporal Augmentation (speed change, frame drop)
- Mixup (α=0.4)
- CoarseDropout

## License

This model is for research purposes only.

## Citation

```

@misc{driver-behavior-detection-2026,

  title={Driver Behavior Detection using Video Swin Transformer},

  author={C-Team},

  year={2026}

}

```