--- language: - ko license: apache-2.0 tags: - video-classification - driver-behavior-detection - swin-transformer - video-swin - pytorch datasets: - custom metrics: - accuracy - f1 pipeline_tag: video-classification model-index: - name: driver-behavior-swin3d-t results: - task: type: video-classification name: Video Classification metrics: - type: accuracy value: 0.9805 name: Accuracy - type: f1 value: 0.9757 name: Macro F1 --- # Driver Behavior Detection Model (Epoch 7) 운전자 이상행동 감지를 위한 Video Swin Transformer 기반 모델입니다. ## Model Description - **Architecture**: Video Swin Transformer Tiny (swin3d_t) - **Backbone Pretrained**: Kinetics-400 - **Parameters**: 27.85M - **Input**: [B, 3, 30, 224, 224] (batch, channels, frames, height, width) ## Classes (5) | Label | Class | F1-Score | |:-----:|-------|:--------:| | 0 | 정상 (Normal) | 0.97 | | 1 | 졸음운전 (Drowsy Driving) | 0.99 | | 2 | 물건찾기 (Reaching/Searching) | 0.96 | | 3 | 휴대폰 사용 (Phone Usage) | 0.96 | | 4 | 운전자 폭행 (Driver Assault) | 1.00 | ## Performance (Epoch 7) | Metric | Value | |--------|-------| | **Accuracy** | 98.05% | | **Macro F1** | 0.9757 | | **Validation Samples** | 1,371,062 | ## Training Configuration | Parameter | Value | |-----------|-------| | Hardware | 2x NVIDIA RTX A6000 (48GB) | | Distributed | DDP (DistributedDataParallel) | | Batch Size | 32 (16 x 2 GPU) | | Gradient Accumulation | 4 | | Effective Batch | 128 | | Optimizer | AdamW (lr=1e-3, wd=0.05) | | Scheduler | OneCycleLR | | Mixed Precision | FP16 | | Loss | CrossEntropy + Label Smoothing (0.1) | | Regularization | Mixup (a=0.4), Dropout (0.3) | ## Files | File | Size | Description | |------|:----:|-------------| | `pytorch_model.bin` | 121 MB | PyTorch weights (FP32) | | `model.onnx` | 164 MB | ONNX model for mobile deployment | | `config.json` | 1.2 KB | Model configuration | | `model.py` | 6.9 KB | Model architecture code | | `convert_coreml_macos.py` | 2.2 KB | CoreML conversion script (macOS) | ## Platform-specific Usage ### PyTorch (Server/Desktop) ```python import torch from model import DriverBehaviorModel model = DriverBehaviorModel(num_classes=5, pretrained=False) checkpoint = torch.load("pytorch_model.bin", map_location="cpu") model.load_state_dict(checkpoint["model"]) model.eval() ``` ### iOS (CoreML) 1. Copy `model.onnx` to macOS 2. Run conversion script: ```bash python convert_coreml_macos.py ``` 3. Add generated `DriverBehavior.mlpackage` to Xcode project ### Android (ONNX Runtime) ```kotlin // build.gradle implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0' // Kotlin val session = OrtEnvironment.getEnvironment() .createSession(assetManager.open("model.onnx").readBytes()) val output = session.run(mapOf("video_input" to inputTensor)) ``` ## Preprocessing (All Platforms) ``` Input Shape: [1, 3, 30, 224, 224] (batch, channels, frames, height, width) Channel Order: RGB Normalization: (pixel / 255.0 - mean) / std - mean = [0.485, 0.456, 0.406] - std = [0.229, 0.224, 0.225] Resize: 224x224 (BILINEAR) Frames: 30 frames uniformly sampled ``` ## Dataset - **Total Videos**: 243,979 - **Total Samples (windows)**: 1,371,062 - **Window Size**: 30 frames - **Stride**: 15 frames - **Resolution**: 224x224 ## Training Progress | Epoch | Accuracy | Macro F1 | |:-----:|:--------:|:--------:| | 5 | 97.35% | 0.9666 | | 6 | 97.74% | 0.9720 | | **7** | **98.05%** | **0.9757** | ## License This model is for research purposes only. ## Citation ``` @misc{driver-behavior-detection-2026, title={Driver Behavior Detection using Video Swin Transformer}, author={C-Team}, year={2026} } ```