YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
BaramNuri (바람누리) - Driver Behavior Detection Model
Model Description
**바람누리(BaramNuri)**는 차량 내 카메라 영상에서 운전자의 이상행동을 실시간으로 탐지하는 경량화 딥러닝 모델입니다.
Key Features
- 경량화: Teacher 모델(27.86M) 대비 49% 파라미터 감소 (14.20M)
- 고성능: Knowledge Distillation으로 98% 성능 유지
- 실시간: 엣지 디바이스 배포 가능 (INT8: ~13MB)
- 5종 분류: 정상, 졸음운전, 물건찾기, 휴대폰 사용, 운전자 폭행
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ BaramNuri Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Input: [B, 3, 30, 224, 224] (1초 영상, 30fps) │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ Video Swin-T (Stage 1-3) │ ← Kinetics-400 │
│ │ Shifted Window Attention │ Pretrained │
│ │ Output: 384 dim features │ │
│ └─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ Selective SSM Block (x2) │ ← Mamba-style │
│ │ - 1D Conv for local context │ Temporal │
│ │ - Selective state space │ Modeling │
│ │ - Input-dependent B, C, delta │ │
│ └─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ Classification Head │ │
│ │ LayerNorm → Dropout → Linear │ │
│ └─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Output: [B, 5] (5-class logits) │
│ │
└─────────────────────────────────────────────────────────────────┘
Why This Architecture?
| Component | Purpose | Benefit |
|---|---|---|
| Video Swin (Stage 1-3) | Spatial feature extraction | Proven performance on video |
| Stage 4 Removal | 55% parameter reduction | Lightweight without quality loss |
| Selective SSM | Temporal modeling | O(n) complexity vs O(n²) attention |
| Knowledge Distillation | Performance retention | Learn from larger teacher model |
Performance
Classification Metrics
| Metric | Score |
|---|---|
| Accuracy | 96.17% |
| Macro F1 | 0.9504 |
| Precision | 0.95 |
| Recall | 0.95 |
Per-Class Performance
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| 정상 (Normal) | 0.93 | 0.93 | 0.93 |
| 졸음운전 (Drowsy) | 0.98 | 0.97 | 0.97 |
| 물건찾기 (Searching) | 0.93 | 0.95 | 0.94 |
| 휴대폰 사용 (Phone) | 0.94 | 0.93 | 0.94 |
| 운전자 폭행 (Assault) | 0.99 | 0.99 | 0.99 |
Comparison with Teacher
| Metric | Teacher | BaramNuri | Comparison |
|---|---|---|---|
| Parameters | 27.86M | 14.20M | -49% |
| Model Size (FP32) | ~106 MB | ~54 MB | -49% |
| Model Size (INT8) | ~26 MB | ~13 MB | -50% |
| Accuracy | 98.05% | 96.17% | 98.1% retained |
| Macro F1 | 0.9757 | 0.9504 | 97.4% retained |
Quick Start
Installation
pip install torch torchvision
Inference
import torch
from model import BaramNuri
# Load model
model = BaramNuri(num_classes=5, pretrained=False)
checkpoint = torch.load('baramnuri_beta.pth', map_location='cpu')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Prepare input (1 second video, 30fps, 224x224)
# Shape: [batch, channels, frames, height, width]
video = torch.randn(1, 3, 30, 224, 224)
# Inference
with torch.no_grad():
logits = model(video)
probs = torch.softmax(logits, dim=-1)
pred_class = probs.argmax(dim=-1).item()
# Class names
class_names = ["정상", "졸음운전", "물건찾기", "휴대폰 사용", "운전자 폭행"]
print(f"Predicted: {class_names[pred_class]} ({probs[0, pred_class]:.2%})")
With Prediction Helper
# Single prediction with confidence
result = model.predict(video)
print(f"Class: {result['class_name']}")
print(f"Confidence: {result['confidence']:.2%}")
Input Specification
| Parameter | Value |
|---|---|
| Format | [B, C, T, H, W] (BCTHW) |
| Channels | 3 (RGB) |
| Frames | 30 (1 second at 30fps) |
| Resolution | 224 x 224 |
| Normalization | ImageNet mean/std |
Preprocessing
from torchvision import transforms
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
),
])
Training Details
Knowledge Distillation
Teacher: Video Swin-T (27.86M, 98.05% acc)
│
│ Soft Labels (Temperature=4.0)
▼
Student: BaramNuri (14.20M)
│
│ L = 0.5 * L_hard + 0.5 * L_soft
▼
Result: 96.17% acc (98% of teacher performance)
Training Configuration
| Parameter | Value |
|---|---|
| Optimizer | AdamW |
| Learning Rate | 1e-4 |
| Weight Decay | 0.05 |
| Batch Size | 96 (effective) |
| Epochs | 6 |
| Loss | CE + KL Divergence |
| Temperature | 4.0 |
| Alpha (hard/soft) | 0.5 |
Deployment
Server Deployment (GPU)
model = BaramNuri(num_classes=5)
model.load_state_dict(torch.load('baramnuri_beta.pth')['model_state_dict'])
model = model.cuda().eval()
# FP16 for faster inference
model = model.half()
Edge Deployment (INT8 Quantization)
import torch.quantization as quant
model_int8 = quant.quantize_dynamic(
model, {torch.nn.Linear}, dtype=torch.qint8
)
# Model size: ~13MB
ONNX Export
dummy_input = torch.randn(1, 3, 30, 224, 224)
torch.onnx.export(
model, dummy_input, "baramnuri.onnx",
input_names=['video'],
output_names=['logits'],
dynamic_axes={'video': {0: 'batch'}}
)
Use Cases
- Fleet Management: Monitor driver behavior in commercial vehicles
- Insurance Telematics: Risk assessment based on driving behavior
- ADAS Integration: Advanced driver assistance systems
- Safety Research: Analyze driving patterns and fatigue
Limitations
- Trained on Korean driving environment data
- Requires frontal camera facing the driver
- Optimal performance at 30fps input
- May require fine-tuning for different camera angles
Citation
@misc{baramnuri2025,
title={BaramNuri: Lightweight Driver Behavior Detection with Knowledge Distillation},
author={C-Team},
year={2025},
howpublished={\url{https://huggingface.co/c-team/baramnuri-beta}}
}
License
This model is released under the Apache 2.0 License.
Acknowledgments
- Video Swin Transformer: Liu et al. (CVPR 2022)
- Knowledge Distillation: Hinton et al. (2015)
- Mamba/S4: Gu & Dao (2023)
바람누리 - 안전한 운전을 위한 AI
Made with care by C-Team
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support