YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

BaramNuri (바람누리) - Driver Behavior Detection Model

바람누리 | Wind that watches over the world

경량화된 운전자 이상행동 탐지 AI 모델

License Python PyTorch


Model Description

**바람누리(BaramNuri)**는 차량 내 카메라 영상에서 운전자의 이상행동을 실시간으로 탐지하는 경량화 딥러닝 모델입니다.

Key Features

  • 경량화: Teacher 모델(27.86M) 대비 49% 파라미터 감소 (14.20M)
  • 고성능: Knowledge Distillation으로 98% 성능 유지
  • 실시간: 엣지 디바이스 배포 가능 (INT8: ~13MB)
  • 5종 분류: 정상, 졸음운전, 물건찾기, 휴대폰 사용, 운전자 폭행

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     BaramNuri Architecture                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   Input: [B, 3, 30, 224, 224] (1초 영상, 30fps)                 │
│                         │                                        │
│                         ▼                                        │
│   ┌─────────────────────────────────────┐                       │
│   │     Video Swin-T (Stage 1-3)        │  ← Kinetics-400       │
│   │     Shifted Window Attention         │     Pretrained        │
│   │     Output: 384 dim features        │                       │
│   └─────────────────────────────────────┘                       │
│                         │                                        │
│                         ▼                                        │
│   ┌─────────────────────────────────────┐                       │
│   │     Selective SSM Block (x2)        │  ← Mamba-style        │
│   │     - 1D Conv for local context     │     Temporal          │
│   │     - Selective state space         │     Modeling          │
│   │     - Input-dependent B, C, delta   │                       │
│   └─────────────────────────────────────┘                       │
│                         │                                        │
│                         ▼                                        │
│   ┌─────────────────────────────────────┐                       │
│   │     Classification Head             │                       │
│   │     LayerNorm → Dropout → Linear    │                       │
│   └─────────────────────────────────────┘                       │
│                         │                                        │
│                         ▼                                        │
│   Output: [B, 5] (5-class logits)                               │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Why This Architecture?

Component Purpose Benefit
Video Swin (Stage 1-3) Spatial feature extraction Proven performance on video
Stage 4 Removal 55% parameter reduction Lightweight without quality loss
Selective SSM Temporal modeling O(n) complexity vs O(n²) attention
Knowledge Distillation Performance retention Learn from larger teacher model

Performance

Classification Metrics

Metric Score
Accuracy 96.17%
Macro F1 0.9504
Precision 0.95
Recall 0.95

Per-Class Performance

Class Precision Recall F1-Score
정상 (Normal) 0.93 0.93 0.93
졸음운전 (Drowsy) 0.98 0.97 0.97
물건찾기 (Searching) 0.93 0.95 0.94
휴대폰 사용 (Phone) 0.94 0.93 0.94
운전자 폭행 (Assault) 0.99 0.99 0.99

Comparison with Teacher

Metric Teacher BaramNuri Comparison
Parameters 27.86M 14.20M -49%
Model Size (FP32) ~106 MB ~54 MB -49%
Model Size (INT8) ~26 MB ~13 MB -50%
Accuracy 98.05% 96.17% 98.1% retained
Macro F1 0.9757 0.9504 97.4% retained

Quick Start

Installation

pip install torch torchvision

Inference

import torch
from model import BaramNuri

# Load model
model = BaramNuri(num_classes=5, pretrained=False)
checkpoint = torch.load('baramnuri_beta.pth', map_location='cpu')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Prepare input (1 second video, 30fps, 224x224)
# Shape: [batch, channels, frames, height, width]
video = torch.randn(1, 3, 30, 224, 224)

# Inference
with torch.no_grad():
    logits = model(video)
    probs = torch.softmax(logits, dim=-1)
    pred_class = probs.argmax(dim=-1).item()

# Class names
class_names = ["정상", "졸음운전", "물건찾기", "휴대폰 사용", "운전자 폭행"]
print(f"Predicted: {class_names[pred_class]} ({probs[0, pred_class]:.2%})")

With Prediction Helper

# Single prediction with confidence
result = model.predict(video)
print(f"Class: {result['class_name']}")
print(f"Confidence: {result['confidence']:.2%}")

Input Specification

Parameter Value
Format [B, C, T, H, W] (BCTHW)
Channels 3 (RGB)
Frames 30 (1 second at 30fps)
Resolution 224 x 224
Normalization ImageNet mean/std

Preprocessing

from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])

Training Details

Knowledge Distillation

Teacher: Video Swin-T (27.86M, 98.05% acc)
    │
    │  Soft Labels (Temperature=4.0)
    ▼
Student: BaramNuri (14.20M)
    │
    │  L = 0.5 * L_hard + 0.5 * L_soft
    ▼
Result: 96.17% acc (98% of teacher performance)

Training Configuration

Parameter Value
Optimizer AdamW
Learning Rate 1e-4
Weight Decay 0.05
Batch Size 96 (effective)
Epochs 6
Loss CE + KL Divergence
Temperature 4.0
Alpha (hard/soft) 0.5

Deployment

Server Deployment (GPU)

model = BaramNuri(num_classes=5)
model.load_state_dict(torch.load('baramnuri_beta.pth')['model_state_dict'])
model = model.cuda().eval()

# FP16 for faster inference
model = model.half()

Edge Deployment (INT8 Quantization)

import torch.quantization as quant

model_int8 = quant.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)
# Model size: ~13MB

ONNX Export

dummy_input = torch.randn(1, 3, 30, 224, 224)
torch.onnx.export(
    model, dummy_input, "baramnuri.onnx",
    input_names=['video'],
    output_names=['logits'],
    dynamic_axes={'video': {0: 'batch'}}
)

Use Cases

  1. Fleet Management: Monitor driver behavior in commercial vehicles
  2. Insurance Telematics: Risk assessment based on driving behavior
  3. ADAS Integration: Advanced driver assistance systems
  4. Safety Research: Analyze driving patterns and fatigue

Limitations

  • Trained on Korean driving environment data
  • Requires frontal camera facing the driver
  • Optimal performance at 30fps input
  • May require fine-tuning for different camera angles

Citation

@misc{baramnuri2025,
  title={BaramNuri: Lightweight Driver Behavior Detection with Knowledge Distillation},
  author={C-Team},
  year={2025},
  howpublished={\url{https://huggingface.co/c-team/baramnuri-beta}}
}

License

This model is released under the Apache 2.0 License.


Acknowledgments

  • Video Swin Transformer: Liu et al. (CVPR 2022)
  • Knowledge Distillation: Hinton et al. (2015)
  • Mamba/S4: Gu & Dao (2023)

바람누리 - 안전한 운전을 위한 AI

Made with care by C-Team

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support