Baramnuri / README.md

koreashin

Upload 6 files

5a64d6e verified 27 days ago

preview code

raw

history blame contribute delete

9.71 kB

BaramNuri (바람누리) - Driver Behavior Detection Model

바람누리 | Wind that watches over the world

경량화된 운전자 이상행동 탐지 AI 모델

Model Description

**바람누리(BaramNuri)**는 차량 내 카메라 영상에서 운전자의 이상행동을 실시간으로 탐지하는 경량화 딥러닝 모델입니다.

Key Features

경량화: Teacher 모델(27.86M) 대비 49% 파라미터 감소 (14.20M)
고성능: Knowledge Distillation으로 98% 성능 유지
실시간: 엣지 디바이스 배포 가능 (INT8: ~13MB)
5종 분류: 정상, 졸음운전, 물건찾기, 휴대폰 사용, 운전자 폭행

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     BaramNuri Architecture                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   Input: [B, 3, 30, 224, 224] (1초 영상, 30fps)                 │
│                         │                                        │
│                         ▼                                        │
│   ┌─────────────────────────────────────┐                       │
│   │     Video Swin-T (Stage 1-3)        │  ← Kinetics-400       │
│   │     Shifted Window Attention         │     Pretrained        │
│   │     Output: 384 dim features        │                       │
│   └─────────────────────────────────────┘                       │
│                         │                                        │
│                         ▼                                        │
│   ┌─────────────────────────────────────┐                       │
│   │     Selective SSM Block (x2)        │  ← Mamba-style        │
│   │     - 1D Conv for local context     │     Temporal          │
│   │     - Selective state space         │     Modeling          │
│   │     - Input-dependent B, C, delta   │                       │
│   └─────────────────────────────────────┘                       │
│                         │                                        │
│                         ▼                                        │
│   ┌─────────────────────────────────────┐                       │
│   │     Classification Head             │                       │
│   │     LayerNorm → Dropout → Linear    │                       │
│   └─────────────────────────────────────┘                       │
│                         │                                        │
│                         ▼                                        │
│   Output: [B, 5] (5-class logits)                               │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Why This Architecture?

Component	Purpose	Benefit
Video Swin (Stage 1-3)	Spatial feature extraction	Proven performance on video
Stage 4 Removal	55% parameter reduction	Lightweight without quality loss
Selective SSM	Temporal modeling	O(n) complexity vs O(n²) attention
Knowledge Distillation	Performance retention	Learn from larger teacher model

Performance

Classification Metrics

Metric	Score
Accuracy	96.17%
Macro F1	0.9504
Precision	0.95
Recall	0.95

Per-Class Performance

Class	Precision	Recall	F1-Score
정상 (Normal)	0.93	0.93	0.93
졸음운전 (Drowsy)	0.98	0.97	0.97
물건찾기 (Searching)	0.93	0.95	0.94
휴대폰 사용 (Phone)	0.94	0.93	0.94
운전자 폭행 (Assault)	0.99	0.99	0.99

Comparison with Teacher

Metric	Teacher	BaramNuri	Comparison
Parameters	27.86M	14.20M	-49%
Model Size (FP32)	~106 MB	~54 MB	-49%
Model Size (INT8)	~26 MB	~13 MB	-50%
Accuracy	98.05%	96.17%	98.1% retained
Macro F1	0.9757	0.9504	97.4% retained

Quick Start

Installation

pip install torch torchvision

Inference

import torch
from model import BaramNuri

# Load model
model = BaramNuri(num_classes=5, pretrained=False)
checkpoint = torch.load('baramnuri_beta.pth', map_location='cpu')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Prepare input (1 second video, 30fps, 224x224)
# Shape: [batch, channels, frames, height, width]
video = torch.randn(1, 3, 30, 224, 224)

# Inference
with torch.no_grad():
    logits = model(video)
    probs = torch.softmax(logits, dim=-1)
    pred_class = probs.argmax(dim=-1).item()

# Class names
class_names = ["정상", "졸음운전", "물건찾기", "휴대폰 사용", "운전자 폭행"]
print(f"Predicted: {class_names[pred_class]} ({probs[0, pred_class]:.2%})")

With Prediction Helper

# Single prediction with confidence
result = model.predict(video)
print(f"Class: {result['class_name']}")
print(f"Confidence: {result['confidence']:.2%}")

Input Specification

Parameter	Value
Format	`[B, C, T, H, W]` (BCTHW)
Channels	3 (RGB)
Frames	30 (1 second at 30fps)
Resolution	224 x 224
Normalization	ImageNet mean/std

Preprocessing

from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])

Training Details

Knowledge Distillation

Teacher: Video Swin-T (27.86M, 98.05% acc)
    │
    │  Soft Labels (Temperature=4.0)
    ▼
Student: BaramNuri (14.20M)
    │
    │  L = 0.5 * L_hard + 0.5 * L_soft
    ▼
Result: 96.17% acc (98% of teacher performance)

Training Configuration

Parameter	Value
Optimizer	AdamW
Learning Rate	1e-4
Weight Decay	0.05
Batch Size	96 (effective)
Epochs	6
Loss	CE + KL Divergence
Temperature	4.0
Alpha (hard/soft)	0.5

Deployment

Server Deployment (GPU)

model = BaramNuri(num_classes=5)
model.load_state_dict(torch.load('baramnuri_beta.pth')['model_state_dict'])
model = model.cuda().eval()

# FP16 for faster inference
model = model.half()

Edge Deployment (INT8 Quantization)

import torch.quantization as quant

model_int8 = quant.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)
# Model size: ~13MB

ONNX Export

dummy_input = torch.randn(1, 3, 30, 224, 224)
torch.onnx.export(
    model, dummy_input, "baramnuri.onnx",
    input_names=['video'],
    output_names=['logits'],
    dynamic_axes={'video': {0: 'batch'}}
)

Use Cases

Fleet Management: Monitor driver behavior in commercial vehicles
Insurance Telematics: Risk assessment based on driving behavior
ADAS Integration: Advanced driver assistance systems
Safety Research: Analyze driving patterns and fatigue

Limitations

Trained on Korean driving environment data
Requires frontal camera facing the driver
Optimal performance at 30fps input
May require fine-tuning for different camera angles

Citation

@misc{baramnuri2025,
  title={BaramNuri: Lightweight Driver Behavior Detection with Knowledge Distillation},
  author={C-Team},
  year={2025},
  howpublished={\url{https://huggingface.co/c-team/baramnuri-beta}}
}

License

This model is released under the Apache 2.0 License.

Acknowledgments

Video Swin Transformer: Liu et al. (CVPR 2022)
Knowledge Distillation: Hinton et al. (2015)
Mamba/S4: Gu & Dao (2023)

바람누리 - 안전한 운전을 위한 AI

Made with care by C-Team