# BaramNuri (바람누리) - Driver Behavior Detection Model
**바람누리** | *Wind that watches over the world* 경량화된 운전자 이상행동 탐지 AI 모델 [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) [![Python](https://img.shields.io/badge/Python-3.8+-green.svg)](https://python.org) [![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red.svg)](https://pytorch.org)
--- ## Model Description **바람누리(BaramNuri)**는 차량 내 카메라 영상에서 운전자의 이상행동을 실시간으로 탐지하는 경량화 딥러닝 모델입니다. ### Key Features - **경량화**: Teacher 모델(27.86M) 대비 **49% 파라미터 감소** (14.20M) - **고성능**: Knowledge Distillation으로 **98% 성능 유지** - **실시간**: 엣지 디바이스 배포 가능 (INT8: ~13MB) - **5종 분류**: 정상, 졸음운전, 물건찾기, 휴대폰 사용, 운전자 폭행 --- ## Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ BaramNuri Architecture │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Input: [B, 3, 30, 224, 224] (1초 영상, 30fps) │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────┐ │ │ │ Video Swin-T (Stage 1-3) │ ← Kinetics-400 │ │ │ Shifted Window Attention │ Pretrained │ │ │ Output: 384 dim features │ │ │ └─────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────┐ │ │ │ Selective SSM Block (x2) │ ← Mamba-style │ │ │ - 1D Conv for local context │ Temporal │ │ │ - Selective state space │ Modeling │ │ │ - Input-dependent B, C, delta │ │ │ └─────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────┐ │ │ │ Classification Head │ │ │ │ LayerNorm → Dropout → Linear │ │ │ └─────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ Output: [B, 5] (5-class logits) │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Why This Architecture? | Component | Purpose | Benefit | |-----------|---------|---------| | **Video Swin (Stage 1-3)** | Spatial feature extraction | Proven performance on video | | **Stage 4 Removal** | 55% parameter reduction | Lightweight without quality loss | | **Selective SSM** | Temporal modeling | O(n) complexity vs O(n²) attention | | **Knowledge Distillation** | Performance retention | Learn from larger teacher model | --- ## Performance ### Classification Metrics | Metric | Score | |--------|-------| | **Accuracy** | 96.17% | | **Macro F1** | 0.9504 | | **Precision** | 0.95 | | **Recall** | 0.95 | ### Per-Class Performance | Class | Precision | Recall | F1-Score | |-------|:---------:|:------:|:--------:| | 정상 (Normal) | 0.93 | 0.93 | 0.93 | | 졸음운전 (Drowsy) | 0.98 | 0.97 | 0.97 | | 물건찾기 (Searching) | 0.93 | 0.95 | 0.94 | | 휴대폰 사용 (Phone) | 0.94 | 0.93 | 0.94 | | 운전자 폭행 (Assault) | 0.99 | 0.99 | 0.99 | ### Comparison with Teacher | Metric | Teacher | BaramNuri | Comparison | |--------|---------|-----------|------------| | **Parameters** | 27.86M | 14.20M | **-49%** | | **Model Size (FP32)** | ~106 MB | ~54 MB | **-49%** | | **Model Size (INT8)** | ~26 MB | ~13 MB | **-50%** | | **Accuracy** | 98.05% | 96.17% | 98.1% retained | | **Macro F1** | 0.9757 | 0.9504 | 97.4% retained | --- ## Quick Start ### Installation ```bash pip install torch torchvision ``` ### Inference ```python import torch from model import BaramNuri # Load model model = BaramNuri(num_classes=5, pretrained=False) checkpoint = torch.load('baramnuri_beta.pth', map_location='cpu') model.load_state_dict(checkpoint['model_state_dict']) model.eval() # Prepare input (1 second video, 30fps, 224x224) # Shape: [batch, channels, frames, height, width] video = torch.randn(1, 3, 30, 224, 224) # Inference with torch.no_grad(): logits = model(video) probs = torch.softmax(logits, dim=-1) pred_class = probs.argmax(dim=-1).item() # Class names class_names = ["정상", "졸음운전", "물건찾기", "휴대폰 사용", "운전자 폭행"] print(f"Predicted: {class_names[pred_class]} ({probs[0, pred_class]:.2%})") ``` ### With Prediction Helper ```python # Single prediction with confidence result = model.predict(video) print(f"Class: {result['class_name']}") print(f"Confidence: {result['confidence']:.2%}") ``` --- ## Input Specification | Parameter | Value | |-----------|-------| | **Format** | `[B, C, T, H, W]` (BCTHW) | | **Channels** | 3 (RGB) | | **Frames** | 30 (1 second at 30fps) | | **Resolution** | 224 x 224 | | **Normalization** | ImageNet mean/std | ### Preprocessing ```python from torchvision import transforms transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ), ]) ``` --- ## Training Details ### Knowledge Distillation ``` Teacher: Video Swin-T (27.86M, 98.05% acc) │ │ Soft Labels (Temperature=4.0) ▼ Student: BaramNuri (14.20M) │ │ L = 0.5 * L_hard + 0.5 * L_soft ▼ Result: 96.17% acc (98% of teacher performance) ``` ### Training Configuration | Parameter | Value | |-----------|-------| | Optimizer | AdamW | | Learning Rate | 1e-4 | | Weight Decay | 0.05 | | Batch Size | 96 (effective) | | Epochs | 6 | | Loss | CE + KL Divergence | | Temperature | 4.0 | | Alpha (hard/soft) | 0.5 | --- ## Deployment ### Server Deployment (GPU) ```python model = BaramNuri(num_classes=5) model.load_state_dict(torch.load('baramnuri_beta.pth')['model_state_dict']) model = model.cuda().eval() # FP16 for faster inference model = model.half() ``` ### Edge Deployment (INT8 Quantization) ```python import torch.quantization as quant model_int8 = quant.quantize_dynamic( model, {torch.nn.Linear}, dtype=torch.qint8 ) # Model size: ~13MB ``` ### ONNX Export ```python dummy_input = torch.randn(1, 3, 30, 224, 224) torch.onnx.export( model, dummy_input, "baramnuri.onnx", input_names=['video'], output_names=['logits'], dynamic_axes={'video': {0: 'batch'}} ) ``` --- ## Use Cases 1. **Fleet Management**: Monitor driver behavior in commercial vehicles 2. **Insurance Telematics**: Risk assessment based on driving behavior 3. **ADAS Integration**: Advanced driver assistance systems 4. **Safety Research**: Analyze driving patterns and fatigue --- ## Limitations - Trained on Korean driving environment data - Requires frontal camera facing the driver - Optimal performance at 30fps input - May require fine-tuning for different camera angles --- ## Citation ```bibtex @misc{baramnuri2025, title={BaramNuri: Lightweight Driver Behavior Detection with Knowledge Distillation}, author={C-Team}, year={2025}, howpublished={\url{https://huggingface.co/c-team/baramnuri-beta}} } ``` --- ## License This model is released under the [Apache 2.0 License](LICENSE). --- ## Acknowledgments - Video Swin Transformer: Liu et al. (CVPR 2022) - Knowledge Distillation: Hinton et al. (2015) - Mamba/S4: Gu & Dao (2023) ---
**바람누리** - 안전한 운전을 위한 AI Made with care by C-Team