Baramnuri / README.md

Upload 6 files

5a64d6e verified 28 days ago

9.71 kB

	# BaramNuri (바람누리) - Driver Behavior Detection Model

	<div align="center">

	바람누리 \| Wind that watches over the world

	경량화된 운전자 이상행동 탐지 AI 모델

	[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
	[![Python](https://img.shields.io/badge/Python-3.8+-green.svg)](https://python.org)
	[![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red.svg)](https://pytorch.org)

	</div>

	---

	## Model Description

	바람누리(BaramNuri)는 차량 내 카메라 영상에서 운전자의 이상행동을 실시간으로 탐지하는 경량화 딥러닝 모델입니다.

	### Key Features

	- 경량화: Teacher 모델(27.86M) 대비 49% 파라미터 감소 (14.20M)
	- 고성능: Knowledge Distillation으로 98% 성능 유지
	- 실시간: 엣지 디바이스 배포 가능 (INT8: ~13MB)
	- 5종 분류: 정상, 졸음운전, 물건찾기, 휴대폰 사용, 운전자 폭행

	---

	## Architecture

	```
	┌─────────────────────────────────────────────────────────────────┐
	│ BaramNuri Architecture │
	├─────────────────────────────────────────────────────────────────┤
	│ │
	│ Input: [B, 3, 30, 224, 224] (1초 영상, 30fps) │
	│ │ │
	│ ▼ │
	│ ┌─────────────────────────────────────┐ │
	│ │ Video Swin-T (Stage 1-3) │ ← Kinetics-400 │
	│ │ Shifted Window Attention │ Pretrained │
	│ │ Output: 384 dim features │ │
	│ └─────────────────────────────────────┘ │
	│ │ │
	│ ▼ │
	│ ┌─────────────────────────────────────┐ │
	│ │ Selective SSM Block (x2) │ ← Mamba-style │
	│ │ - 1D Conv for local context │ Temporal │
	│ │ - Selective state space │ Modeling │
	│ │ - Input-dependent B, C, delta │ │
	│ └─────────────────────────────────────┘ │
	│ │ │
	│ ▼ │
	│ ┌─────────────────────────────────────┐ │
	│ │ Classification Head │ │
	│ │ LayerNorm → Dropout → Linear │ │
	│ └─────────────────────────────────────┘ │
	│ │ │
	│ ▼ │
	│ Output: [B, 5] (5-class logits) │
	│ │
	└─────────────────────────────────────────────────────────────────┘
	```

	### Why This Architecture?

	\| Component \| Purpose \| Benefit \|
	\|-----------\|---------\|---------\|
	\| Video Swin (Stage 1-3) \| Spatial feature extraction \| Proven performance on video \|
	\| Stage 4 Removal \| 55% parameter reduction \| Lightweight without quality loss \|
	\| Selective SSM \| Temporal modeling \| O(n) complexity vs O(n²) attention \|
	\| Knowledge Distillation \| Performance retention \| Learn from larger teacher model \|

	---

	## Performance

	### Classification Metrics

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Accuracy \| 96.17% \|
	\| Macro F1 \| 0.9504 \|
	\| Precision \| 0.95 \|
	\| Recall \| 0.95 \|

	### Per-Class Performance

	\| Class \| Precision \| Recall \| F1-Score \|
	\|-------\|:---------:\|:------:\|:--------:\|
	\| 정상 (Normal) \| 0.93 \| 0.93 \| 0.93 \|
	\| 졸음운전 (Drowsy) \| 0.98 \| 0.97 \| 0.97 \|
	\| 물건찾기 (Searching) \| 0.93 \| 0.95 \| 0.94 \|
	\| 휴대폰 사용 (Phone) \| 0.94 \| 0.93 \| 0.94 \|
	\| 운전자 폭행 (Assault) \| 0.99 \| 0.99 \| 0.99 \|

	### Comparison with Teacher

	\| Metric \| Teacher \| BaramNuri \| Comparison \|
	\|--------\|---------\|-----------\|------------\|
	\| Parameters \| 27.86M \| 14.20M \| -49% \|
	\| Model Size (FP32) \| ~106 MB \| ~54 MB \| -49% \|
	\| Model Size (INT8) \| ~26 MB \| ~13 MB \| -50% \|
	\| Accuracy \| 98.05% \| 96.17% \| 98.1% retained \|
	\| Macro F1 \| 0.9757 \| 0.9504 \| 97.4% retained \|

	---

	## Quick Start

	### Installation

	```bash
	pip install torch torchvision
	```

	### Inference

	```python
	import torch
	from model import BaramNuri

	# Load model
	model = BaramNuri(num_classes=5, pretrained=False)
	checkpoint = torch.load('baramnuri_beta.pth', map_location='cpu')
	model.load_state_dict(checkpoint['model_state_dict'])
	model.eval()

	# Prepare input (1 second video, 30fps, 224x224)
	# Shape: [batch, channels, frames, height, width]
	video = torch.randn(1, 3, 30, 224, 224)

	# Inference
	with torch.no_grad():
	logits = model(video)
	probs = torch.softmax(logits, dim=-1)
	pred_class = probs.argmax(dim=-1).item()

	# Class names
	class_names = ["정상", "졸음운전", "물건찾기", "휴대폰 사용", "운전자 폭행"]
	print(f"Predicted: {class_names[pred_class]} ({probs[0, pred_class]:.2%})")
	```

	### With Prediction Helper

	```python
	# Single prediction with confidence
	result = model.predict(video)
	print(f"Class: {result['class_name']}")
	print(f"Confidence: {result['confidence']:.2%}")
	```

	---

	## Input Specification

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Format \| `[B, C, T, H, W]` (BCTHW) \|
	\| Channels \| 3 (RGB) \|
	\| Frames \| 30 (1 second at 30fps) \|
	\| Resolution \| 224 x 224 \|
	\| Normalization \| ImageNet mean/std \|

	### Preprocessing

	```python
	from torchvision import transforms

	transform = transforms.Compose([
	transforms.Resize((224, 224)),
	transforms.ToTensor(),
	transforms.Normalize(
	mean=[0.485, 0.456, 0.406],
	std=[0.229, 0.224, 0.225]
	),
	])
	```

	---

	## Training Details

	### Knowledge Distillation

	```
	Teacher: Video Swin-T (27.86M, 98.05% acc)
	│
	│ Soft Labels (Temperature=4.0)
	▼
	Student: BaramNuri (14.20M)
	│
	│ L = 0.5 * L_hard + 0.5 * L_soft
	▼
	Result: 96.17% acc (98% of teacher performance)
	```

	### Training Configuration

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Optimizer \| AdamW \|
	\| Learning Rate \| 1e-4 \|
	\| Weight Decay \| 0.05 \|
	\| Batch Size \| 96 (effective) \|
	\| Epochs \| 6 \|
	\| Loss \| CE + KL Divergence \|
	\| Temperature \| 4.0 \|
	\| Alpha (hard/soft) \| 0.5 \|

	---

	## Deployment

	### Server Deployment (GPU)

	```python
	model = BaramNuri(num_classes=5)
	model.load_state_dict(torch.load('baramnuri_beta.pth')['model_state_dict'])
	model = model.cuda().eval()

	# FP16 for faster inference
	model = model.half()
	```

	### Edge Deployment (INT8 Quantization)

	```python
	import torch.quantization as quant

	model_int8 = quant.quantize_dynamic(
	model, {torch.nn.Linear}, dtype=torch.qint8
	)
	# Model size: ~13MB
	```

	### ONNX Export

	```python
	dummy_input = torch.randn(1, 3, 30, 224, 224)
	torch.onnx.export(
	model, dummy_input, "baramnuri.onnx",
	input_names=['video'],
	output_names=['logits'],
	dynamic_axes={'video': {0: 'batch'}}
	)
	```

	---

	## Use Cases

	1. Fleet Management: Monitor driver behavior in commercial vehicles
	2. Insurance Telematics: Risk assessment based on driving behavior
	3. ADAS Integration: Advanced driver assistance systems
	4. Safety Research: Analyze driving patterns and fatigue

	---

	## Limitations

	- Trained on Korean driving environment data
	- Requires frontal camera facing the driver
	- Optimal performance at 30fps input
	- May require fine-tuning for different camera angles

	---

	## Citation

	```bibtex
	@misc{baramnuri2025,
	title={BaramNuri: Lightweight Driver Behavior Detection with Knowledge Distillation},
	author={C-Team},
	year={2025},
	howpublished={\url{https://huggingface.co/c-team/baramnuri-beta}}
	}
	```

	---

	## License

	This model is released under the [Apache 2.0 License](LICENSE).

	---

	## Acknowledgments

	- Video Swin Transformer: Liu et al. (CVPR 2022)
	- Knowledge Distillation: Hinton et al. (2015)
	- Mamba/S4: Gu & Dao (2023)

	---

	<div align="center">

	바람누리 - 안전한 운전을 위한 AI

	Made with care by C-Team

	</div>