Driver_monitoring / README.md

Upload 6 files

f5247c7 verified 27 days ago

3.92 kB

	---
	language:
	- ko
	license: apache-2.0
	tags:
	- video-classification
	- driver-behavior-detection
	- swin-transformer
	- video-swin
	- pytorch
	datasets:
	- custom
	metrics:
	- accuracy
	- f1
	pipeline_tag: video-classification
	model-index:
	- name: driver-behavior-swin3d-t
	results:
	- task:
	type: video-classification
	name: Video Classification
	metrics:
	- type: accuracy
	value: 0.9805
	name: Accuracy
	- type: f1
	value: 0.9757
	name: Macro F1
	---

	# Driver Behavior Detection Model (Epoch 7)

	운전자 이상행동 감지를 위한 Video Swin Transformer 기반 모델입니다.

	## Model Description

	- Architecture: Video Swin Transformer Tiny (swin3d_t)
	- Backbone Pretrained: Kinetics-400
	- Parameters: 27.85M
	- Input: [B, 3, 30, 224, 224] (batch, channels, frames, height, width)

	## Classes (5)

	\| Label \| Class \| F1-Score \|
	\|:-----:\|-------\|:--------:\|
	\| 0 \| 정상 (Normal) \| 0.97 \|
	\| 1 \| 졸음운전 (Drowsy Driving) \| 0.99 \|
	\| 2 \| 물건찾기 (Reaching/Searching) \| 0.96 \|
	\| 3 \| 휴대폰 사용 (Phone Usage) \| 0.96 \|
	\| 4 \| 운전자 폭행 (Driver Assault) \| 1.00 \|

	## Performance (Epoch 7)

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Accuracy \| 98.05% \|
	\| Macro F1 \| 0.9757 \|
	\| Validation Samples \| 1,371,062 \|

	## Training Configuration

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Hardware \| 2x NVIDIA RTX A6000 (48GB) \|
	\| Distributed \| DDP (DistributedDataParallel) \|
	\| Batch Size \| 32 (16 x 2 GPU) \|
	\| Gradient Accumulation \| 4 \|
	\| Effective Batch \| 128 \|
	\| Optimizer \| AdamW (lr=1e-3, wd=0.05) \|
	\| Scheduler \| OneCycleLR \|
	\| Mixed Precision \| FP16 \|
	\| Loss \| CrossEntropy + Label Smoothing (0.1) \|
	\| Regularization \| Mixup (a=0.4), Dropout (0.3) \|

	## Files

	\| File \| Size \| Description \|
	\|------\|:----:\|-------------\|
	\| `pytorch_model.bin` \| 121 MB \| PyTorch weights (FP32) \|
	\| `model.onnx` \| 164 MB \| ONNX model for mobile deployment \|
	\| `config.json` \| 1.2 KB \| Model configuration \|
	\| `model.py` \| 6.9 KB \| Model architecture code \|
	\| `convert_coreml_macos.py` \| 2.2 KB \| CoreML conversion script (macOS) \|

	## Platform-specific Usage

	### PyTorch (Server/Desktop)

	```python
	import torch
	from model import DriverBehaviorModel

	model = DriverBehaviorModel(num_classes=5, pretrained=False)
	checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
	model.load_state_dict(checkpoint["model"])
	model.eval()
	```

	### iOS (CoreML)

	1. Copy `model.onnx` to macOS
	2. Run conversion script:
	```bash
	python convert_coreml_macos.py
	```
	3. Add generated `DriverBehavior.mlpackage` to Xcode project

	### Android (ONNX Runtime)

	```kotlin
	// build.gradle
	implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'

	// Kotlin
	val session = OrtEnvironment.getEnvironment()
	.createSession(assetManager.open("model.onnx").readBytes())

	val output = session.run(mapOf("video_input" to inputTensor))
	```

	## Preprocessing (All Platforms)

	```
	Input Shape: [1, 3, 30, 224, 224] (batch, channels, frames, height, width)
	Channel Order: RGB
	Normalization: (pixel / 255.0 - mean) / std
	- mean = [0.485, 0.456, 0.406]
	- std = [0.229, 0.224, 0.225]
	Resize: 224x224 (BILINEAR)
	Frames: 30 frames uniformly sampled
	```

	## Dataset

	- Total Videos: 243,979
	- Total Samples (windows): 1,371,062
	- Window Size: 30 frames
	- Stride: 15 frames
	- Resolution: 224x224

	## Training Progress

	\| Epoch \| Accuracy \| Macro F1 \|
	\|:-----:\|:--------:\|:--------:\|
	\| 5 \| 97.35% \| 0.9666 \|
	\| 6 \| 97.74% \| 0.9720 \|
	\| 7 \| 98.05% \| 0.9757 \|

	## License

	This model is for research purposes only.

	## Citation

	```
	@misc{driver-behavior-detection-2026,
	title={Driver Behavior Detection using Video Swin Transformer},
	author={C-Team},
	year={2026}
	}
	```