| # BaramNuri (바람누리) - Driver Behavior Detection Model | |
| <div align="center"> | |
| **바람누리** | *Wind that watches over the world* | |
| 경량화된 운전자 이상행동 탐지 AI 모델 | |
| [](LICENSE) | |
| [](https://python.org) | |
| [](https://pytorch.org) | |
| </div> | |
| --- | |
| ## Model Description | |
| **바람누리(BaramNuri)**는 차량 내 카메라 영상에서 운전자의 이상행동을 실시간으로 탐지하는 경량화 딥러닝 모델입니다. | |
| ### Key Features | |
| - **경량화**: Teacher 모델(27.86M) 대비 **49% 파라미터 감소** (14.20M) | |
| - **고성능**: Knowledge Distillation으로 **98% 성능 유지** | |
| - **실시간**: 엣지 디바이스 배포 가능 (INT8: ~13MB) | |
| - **5종 분류**: 정상, 졸음운전, 물건찾기, 휴대폰 사용, 운전자 폭행 | |
| --- | |
| ## Architecture | |
| ``` | |
| ┌─────────────────────────────────────────────────────────────────┐ | |
| │ BaramNuri Architecture │ | |
| ├─────────────────────────────────────────────────────────────────┤ | |
| │ │ | |
| │ Input: [B, 3, 30, 224, 224] (1초 영상, 30fps) │ | |
| │ │ │ | |
| │ ▼ │ | |
| │ ┌─────────────────────────────────────┐ │ | |
| │ │ Video Swin-T (Stage 1-3) │ ← Kinetics-400 │ | |
| │ │ Shifted Window Attention │ Pretrained │ | |
| │ │ Output: 384 dim features │ │ | |
| │ └─────────────────────────────────────┘ │ | |
| │ │ │ | |
| │ ▼ │ | |
| │ ┌─────────────────────────────────────┐ │ | |
| │ │ Selective SSM Block (x2) │ ← Mamba-style │ | |
| │ │ - 1D Conv for local context │ Temporal │ | |
| │ │ - Selective state space │ Modeling │ | |
| │ │ - Input-dependent B, C, delta │ │ | |
| │ └─────────────────────────────────────┘ │ | |
| │ │ │ | |
| │ ▼ │ | |
| │ ┌─────────────────────────────────────┐ │ | |
| │ │ Classification Head │ │ | |
| │ │ LayerNorm → Dropout → Linear │ │ | |
| │ └─────────────────────────────────────┘ │ | |
| │ │ │ | |
| │ ▼ │ | |
| │ Output: [B, 5] (5-class logits) │ | |
| │ │ | |
| └─────────────────────────────────────────────────────────────────┘ | |
| ``` | |
| ### Why This Architecture? | |
| | Component | Purpose | Benefit | | |
| |-----------|---------|---------| | |
| | **Video Swin (Stage 1-3)** | Spatial feature extraction | Proven performance on video | | |
| | **Stage 4 Removal** | 55% parameter reduction | Lightweight without quality loss | | |
| | **Selective SSM** | Temporal modeling | O(n) complexity vs O(n²) attention | | |
| | **Knowledge Distillation** | Performance retention | Learn from larger teacher model | | |
| --- | |
| ## Performance | |
| ### Classification Metrics | |
| | Metric | Score | | |
| |--------|-------| | |
| | **Accuracy** | 96.17% | | |
| | **Macro F1** | 0.9504 | | |
| | **Precision** | 0.95 | | |
| | **Recall** | 0.95 | | |
| ### Per-Class Performance | |
| | Class | Precision | Recall | F1-Score | | |
| |-------|:---------:|:------:|:--------:| | |
| | 정상 (Normal) | 0.93 | 0.93 | 0.93 | | |
| | 졸음운전 (Drowsy) | 0.98 | 0.97 | 0.97 | | |
| | 물건찾기 (Searching) | 0.93 | 0.95 | 0.94 | | |
| | 휴대폰 사용 (Phone) | 0.94 | 0.93 | 0.94 | | |
| | 운전자 폭행 (Assault) | 0.99 | 0.99 | 0.99 | | |
| ### Comparison with Teacher | |
| | Metric | Teacher | BaramNuri | Comparison | | |
| |--------|---------|-----------|------------| | |
| | **Parameters** | 27.86M | 14.20M | **-49%** | | |
| | **Model Size (FP32)** | ~106 MB | ~54 MB | **-49%** | | |
| | **Model Size (INT8)** | ~26 MB | ~13 MB | **-50%** | | |
| | **Accuracy** | 98.05% | 96.17% | 98.1% retained | | |
| | **Macro F1** | 0.9757 | 0.9504 | 97.4% retained | | |
| --- | |
| ## Quick Start | |
| ### Installation | |
| ```bash | |
| pip install torch torchvision | |
| ``` | |
| ### Inference | |
| ```python | |
| import torch | |
| from model import BaramNuri | |
| # Load model | |
| model = BaramNuri(num_classes=5, pretrained=False) | |
| checkpoint = torch.load('baramnuri_beta.pth', map_location='cpu') | |
| model.load_state_dict(checkpoint['model_state_dict']) | |
| model.eval() | |
| # Prepare input (1 second video, 30fps, 224x224) | |
| # Shape: [batch, channels, frames, height, width] | |
| video = torch.randn(1, 3, 30, 224, 224) | |
| # Inference | |
| with torch.no_grad(): | |
| logits = model(video) | |
| probs = torch.softmax(logits, dim=-1) | |
| pred_class = probs.argmax(dim=-1).item() | |
| # Class names | |
| class_names = ["정상", "졸음운전", "물건찾기", "휴대폰 사용", "운전자 폭행"] | |
| print(f"Predicted: {class_names[pred_class]} ({probs[0, pred_class]:.2%})") | |
| ``` | |
| ### With Prediction Helper | |
| ```python | |
| # Single prediction with confidence | |
| result = model.predict(video) | |
| print(f"Class: {result['class_name']}") | |
| print(f"Confidence: {result['confidence']:.2%}") | |
| ``` | |
| --- | |
| ## Input Specification | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | **Format** | `[B, C, T, H, W]` (BCTHW) | | |
| | **Channels** | 3 (RGB) | | |
| | **Frames** | 30 (1 second at 30fps) | | |
| | **Resolution** | 224 x 224 | | |
| | **Normalization** | ImageNet mean/std | | |
| ### Preprocessing | |
| ```python | |
| from torchvision import transforms | |
| transform = transforms.Compose([ | |
| transforms.Resize((224, 224)), | |
| transforms.ToTensor(), | |
| transforms.Normalize( | |
| mean=[0.485, 0.456, 0.406], | |
| std=[0.229, 0.224, 0.225] | |
| ), | |
| ]) | |
| ``` | |
| --- | |
| ## Training Details | |
| ### Knowledge Distillation | |
| ``` | |
| Teacher: Video Swin-T (27.86M, 98.05% acc) | |
| │ | |
| │ Soft Labels (Temperature=4.0) | |
| ▼ | |
| Student: BaramNuri (14.20M) | |
| │ | |
| │ L = 0.5 * L_hard + 0.5 * L_soft | |
| ▼ | |
| Result: 96.17% acc (98% of teacher performance) | |
| ``` | |
| ### Training Configuration | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Optimizer | AdamW | | |
| | Learning Rate | 1e-4 | | |
| | Weight Decay | 0.05 | | |
| | Batch Size | 96 (effective) | | |
| | Epochs | 6 | | |
| | Loss | CE + KL Divergence | | |
| | Temperature | 4.0 | | |
| | Alpha (hard/soft) | 0.5 | | |
| --- | |
| ## Deployment | |
| ### Server Deployment (GPU) | |
| ```python | |
| model = BaramNuri(num_classes=5) | |
| model.load_state_dict(torch.load('baramnuri_beta.pth')['model_state_dict']) | |
| model = model.cuda().eval() | |
| # FP16 for faster inference | |
| model = model.half() | |
| ``` | |
| ### Edge Deployment (INT8 Quantization) | |
| ```python | |
| import torch.quantization as quant | |
| model_int8 = quant.quantize_dynamic( | |
| model, {torch.nn.Linear}, dtype=torch.qint8 | |
| ) | |
| # Model size: ~13MB | |
| ``` | |
| ### ONNX Export | |
| ```python | |
| dummy_input = torch.randn(1, 3, 30, 224, 224) | |
| torch.onnx.export( | |
| model, dummy_input, "baramnuri.onnx", | |
| input_names=['video'], | |
| output_names=['logits'], | |
| dynamic_axes={'video': {0: 'batch'}} | |
| ) | |
| ``` | |
| --- | |
| ## Use Cases | |
| 1. **Fleet Management**: Monitor driver behavior in commercial vehicles | |
| 2. **Insurance Telematics**: Risk assessment based on driving behavior | |
| 3. **ADAS Integration**: Advanced driver assistance systems | |
| 4. **Safety Research**: Analyze driving patterns and fatigue | |
| --- | |
| ## Limitations | |
| - Trained on Korean driving environment data | |
| - Requires frontal camera facing the driver | |
| - Optimal performance at 30fps input | |
| - May require fine-tuning for different camera angles | |
| --- | |
| ## Citation | |
| ```bibtex | |
| @misc{baramnuri2025, | |
| title={BaramNuri: Lightweight Driver Behavior Detection with Knowledge Distillation}, | |
| author={C-Team}, | |
| year={2025}, | |
| howpublished={\url{https://huggingface.co/c-team/baramnuri-beta}} | |
| } | |
| ``` | |
| --- | |
| ## License | |
| This model is released under the [Apache 2.0 License](LICENSE). | |
| --- | |
| ## Acknowledgments | |
| - Video Swin Transformer: Liu et al. (CVPR 2022) | |
| - Knowledge Distillation: Hinton et al. (2015) | |
| - Mamba/S4: Gu & Dao (2023) | |
| --- | |
| <div align="center"> | |
| **바람누리** - 안전한 운전을 위한 AI | |
| Made with care by C-Team | |
| </div> | |