File size: 9,711 Bytes

# BaramNuri (바람누리) - Driver Behavior Detection Model

<div align="center">

**바람누리** | *Wind that watches over the world*

경량화된 운전자 이상행동 탐지 AI 모델

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
[![Python](https://img.shields.io/badge/Python-3.8+-green.svg)](https://python.org)
[![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red.svg)](https://pytorch.org)

</div>

---

## Model Description

**바람누리(BaramNuri)**는 차량 내 카메라 영상에서 운전자의 이상행동을 실시간으로 탐지하는 경량화 딥러닝 모델입니다.

### Key Features

- **경량화**: Teacher 모델(27.86M) 대비 **49% 파라미터 감소** (14.20M)
- **고성능**: Knowledge Distillation으로 **98% 성능 유지**
- **실시간**: 엣지 디바이스 배포 가능 (INT8: ~13MB)
- **5종 분류**: 정상, 졸음운전, 물건찾기, 휴대폰 사용, 운전자 폭행

---

## Architecture

```

┌─────────────────────────────────────────────────────────────────┐

│                     BaramNuri Architecture                       │

├─────────────────────────────────────────────────────────────────┤

│                                                                  │

│   Input: [B, 3, 30, 224, 224] (1초 영상, 30fps)                 │

│                         │                                        │

│                         ▼                                        │

│   ┌─────────────────────────────────────┐                       │

│   │     Video Swin-T (Stage 1-3)        │  ← Kinetics-400       │

│   │     Shifted Window Attention         │     Pretrained        │

│   │     Output: 384 dim features        │                       │

│   └─────────────────────────────────────┘                       │

│                         │                                        │

│                         ▼                                        │

│   ┌─────────────────────────────────────┐                       │

│   │     Selective SSM Block (x2)        │  ← Mamba-style        │

│   │     - 1D Conv for local context     │     Temporal          │

│   │     - Selective state space         │     Modeling          │

│   │     - Input-dependent B, C, delta   │                       │

│   └─────────────────────────────────────┘                       │

│                         │                                        │

│                         ▼                                        │

│   ┌─────────────────────────────────────┐                       │

│   │     Classification Head             │                       │

│   │     LayerNorm → Dropout → Linear    │                       │

│   └─────────────────────────────────────┘                       │

│                         │                                        │

│                         ▼                                        │

│   Output: [B, 5] (5-class logits)                               │

│                                                                  │

└─────────────────────────────────────────────────────────────────┘

```

### Why This Architecture?

| Component | Purpose | Benefit |
|-----------|---------|---------|
| **Video Swin (Stage 1-3)** | Spatial feature extraction | Proven performance on video |
| **Stage 4 Removal** | 55% parameter reduction | Lightweight without quality loss |
| **Selective SSM** | Temporal modeling | O(n) complexity vs O(n²) attention |
| **Knowledge Distillation** | Performance retention | Learn from larger teacher model |

---

## Performance

### Classification Metrics

| Metric | Score |
|--------|-------|
| **Accuracy** | 96.17% |
| **Macro F1** | 0.9504 |
| **Precision** | 0.95 |
| **Recall** | 0.95 |

### Per-Class Performance

| Class | Precision | Recall | F1-Score |
|-------|:---------:|:------:|:--------:|
| 정상 (Normal) | 0.93 | 0.93 | 0.93 |
| 졸음운전 (Drowsy) | 0.98 | 0.97 | 0.97 |
| 물건찾기 (Searching) | 0.93 | 0.95 | 0.94 |
| 휴대폰 사용 (Phone) | 0.94 | 0.93 | 0.94 |
| 운전자 폭행 (Assault) | 0.99 | 0.99 | 0.99 |

### Comparison with Teacher

| Metric | Teacher | BaramNuri | Comparison |
|--------|---------|-----------|------------|
| **Parameters** | 27.86M | 14.20M | **-49%** |
| **Model Size (FP32)** | ~106 MB | ~54 MB | **-49%** |
| **Model Size (INT8)** | ~26 MB | ~13 MB | **-50%** |
| **Accuracy** | 98.05% | 96.17% | 98.1% retained |
| **Macro F1** | 0.9757 | 0.9504 | 97.4% retained |

---

## Quick Start

### Installation

```bash

pip install torch torchvision

```

### Inference

```python

import torch

from model import BaramNuri



# Load model

model = BaramNuri(num_classes=5, pretrained=False)

checkpoint = torch.load('baramnuri_beta.pth', map_location='cpu')

model.load_state_dict(checkpoint['model_state_dict'])

model.eval()



# Prepare input (1 second video, 30fps, 224x224)

# Shape: [batch, channels, frames, height, width]

video = torch.randn(1, 3, 30, 224, 224)



# Inference

with torch.no_grad():

    logits = model(video)

    probs = torch.softmax(logits, dim=-1)

    pred_class = probs.argmax(dim=-1).item()



# Class names

class_names = ["정상", "졸음운전", "물건찾기", "휴대폰 사용", "운전자 폭행"]

print(f"Predicted: {class_names[pred_class]} ({probs[0, pred_class]:.2%})")

```

### With Prediction Helper

```python

# Single prediction with confidence

result = model.predict(video)

print(f"Class: {result['class_name']}")

print(f"Confidence: {result['confidence']:.2%}")

```

---

## Input Specification

| Parameter | Value |
|-----------|-------|
| **Format** | `[B, C, T, H, W]` (BCTHW) |
| **Channels** | 3 (RGB) |
| **Frames** | 30 (1 second at 30fps) |
| **Resolution** | 224 x 224 |
| **Normalization** | ImageNet mean/std |

### Preprocessing

```python

from torchvision import transforms



transform = transforms.Compose([

    transforms.Resize((224, 224)),

    transforms.ToTensor(),

    transforms.Normalize(

        mean=[0.485, 0.456, 0.406],

        std=[0.229, 0.224, 0.225]

    ),

])

```

---

## Training Details

### Knowledge Distillation

```

Teacher: Video Swin-T (27.86M, 98.05% acc)

    │

    │  Soft Labels (Temperature=4.0)

    ▼

Student: BaramNuri (14.20M)

    │

    │  L = 0.5 * L_hard + 0.5 * L_soft

    ▼

Result: 96.17% acc (98% of teacher performance)

```

### Training Configuration

| Parameter | Value |
|-----------|-------|
| Optimizer | AdamW |
| Learning Rate | 1e-4 |
| Weight Decay | 0.05 |
| Batch Size | 96 (effective) |
| Epochs | 6 |
| Loss | CE + KL Divergence |
| Temperature | 4.0 |
| Alpha (hard/soft) | 0.5 |

---

## Deployment

### Server Deployment (GPU)

```python

model = BaramNuri(num_classes=5)

model.load_state_dict(torch.load('baramnuri_beta.pth')['model_state_dict'])

model = model.cuda().eval()



# FP16 for faster inference

model = model.half()

```

### Edge Deployment (INT8 Quantization)

```python

import torch.quantization as quant



model_int8 = quant.quantize_dynamic(

    model, {torch.nn.Linear}, dtype=torch.qint8

)

# Model size: ~13MB

```

### ONNX Export

```python

dummy_input = torch.randn(1, 3, 30, 224, 224)

torch.onnx.export(

    model, dummy_input, "baramnuri.onnx",

    input_names=['video'],

    output_names=['logits'],

    dynamic_axes={'video': {0: 'batch'}}

)

```

---

## Use Cases

1. **Fleet Management**: Monitor driver behavior in commercial vehicles
2. **Insurance Telematics**: Risk assessment based on driving behavior
3. **ADAS Integration**: Advanced driver assistance systems
4. **Safety Research**: Analyze driving patterns and fatigue

---

## Limitations

- Trained on Korean driving environment data
- Requires frontal camera facing the driver
- Optimal performance at 30fps input
- May require fine-tuning for different camera angles

---

## Citation

```bibtex

@misc{baramnuri2025,

  title={BaramNuri: Lightweight Driver Behavior Detection with Knowledge Distillation},

  author={C-Team},

  year={2025},

  howpublished={\url{https://huggingface.co/c-team/baramnuri-beta}}

}

```

---

## License

This model is released under the [Apache 2.0 License](LICENSE).

---

## Acknowledgments

- Video Swin Transformer: Liu et al. (CVPR 2022)
- Knowledge Distillation: Hinton et al. (2015)
- Mamba/S4: Gu & Dao (2023)

---

<div align="center">

**바람누리** - 안전한 운전을 위한 AI

Made with care by C-Team

</div>