koreashin
/

Driver_monitoring

+---
+license: apache-2.0
+language:
+- ko
+tags:
+- video-classification
+- driver-behavior
+- video-swin-transformer
+- pytorch
+- safety
+- autonomous-driving
+metrics:
+- accuracy
+- f1
+pipeline_tag: video-classification
+datasets:
+- custom
+---
+# 🚗 Driver Abnormal Behavior Detection Model
+**운전자 이상행동 탐지 모델** - Video Swin Transformer 기반
+차량 내 카메라 영상에서 운전자의 이상행동을 실시간으로 탐지하는 딥러닝 모델입니다.
+## 📊 Model Performance
+| Metric | Score |
+|--------|-------|
+| **Accuracy** | 95.51% |
+| **Macro F1** | 0.9436 |
+| **Inference Speed** | ~30 FPS (RTX 3090) |
+### Per-Class Performance
+| Class | Korean | Precision | Recall | F1-Score | Support |
+|-------|--------|-----------|--------|----------|---------|
+| 0 | 정상 (Normal) | 0.93 | 0.92 | 0.92 | 159,224 |
+| 1 | 졸음운전 (Drowsy) | 0.99 | 0.98 | 0.98 | 619,450 |
+| 2 | 물건찾기 (Searching) | 0.90 | 0.94 | 0.92 | 261,435 |
+| 3 | 휴대폰 사용 (Phone) | 0.91 | 0.88 | 0.90 | 150,981 |
+| 4 | 운전자 폭행 (Assault) | 1.00 | 1.00 | 1.00 | 179,972 |
+---
+## 🛠️ Installation
+```bash
+# PyTorch 2.0+ 필요
+pip install torch torchvision
+# 추가 dependencies
+pip install opencv-python numpy
+# (선택) HuggingFace에서 다운로드
+pip install huggingface_hub
+```
+---
+## 🚀 Quick Start
+### 1. 모델 다운로드 및 로드
+```python
+import torch
+from torchvision.models.video import swin3d_t
+# ===== 방법 1: 로컬 파일에서 로드 =====
+model = swin3d_t(weights=None)
+model.head = torch.nn.Linear(model.head.in_features, 5)  # 5 classes
+state_dict = torch.load("pytorch_model.bin", map_location="cpu", weights_only=True)
+model.load_state_dict(state_dict)
+model.eval()
+# ===== 방법 2: HuggingFace Hub에서 로드 =====
+from huggingface_hub import hf_hub_download
+model_path = hf_hub_download(
+    repo_id="YOUR_USERNAME/driver-behavior-swin-t",
+    filename="pytorch_model.bin"
+)
+state_dict = torch.load(model_path, map_location="cpu", weights_only=True)
+model = swin3d_t(weights=None)
+model.head = torch.nn.Linear(model.head.in_features, 5)
+model.load_state_dict(state_dict)
+model.eval()
+```
+### 2. 단일 비디오 추론
+```python
+import cv2
+import torch
+import numpy as np
+# 클래스 정의
+CLASS_NAMES = ["정상", "졸음운전", "물건찾기", "휴대폰 사용", "운전자 폭행"]
+CLASS_NAMES_EN = ["Normal", "Drowsy Driving", "Searching Objects", "Phone Usage", "Driver Assault"]
+def load_video_frames(video_path, num_frames=30, size=(224, 224)):
+    """비디오에서 프레임 추출 및 전처리"""
+    cap = cv2.VideoCapture(video_path)
+    frames = []
+    while len(frames) < num_frames:
+        ret, frame = cap.read()
+        if not ret:
+            break
+        # BGR -> RGB
+        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+        # Resize
+        frame = cv2.resize(frame, size)
+        frames.append(frame)
+    cap.release()
+    # 프레임 부족 시 마지막 프레임 복제
+    while len(frames) < num_frames:
+        frames.append(frames[-1] if frames else np.zeros((*size, 3), dtype=np.uint8))
+    # [T, H, W, C] -> [C, T, H, W]
+    frames = np.array(frames[:num_frames], dtype=np.float32)
+    frames = frames.transpose(3, 0, 1, 2)  # [C, T, H, W]
+    # Normalize to [0, 1]
+    frames = frames / 255.0
+    # ImageNet normalization
+    mean = np.array([0.485, 0.456, 0.406]).reshape(3, 1, 1, 1)
+    std = np.array([0.229, 0.224, 0.225]).reshape(3, 1, 1, 1)
+    frames = (frames - mean) / std
+    return torch.FloatTensor(frames)
+def predict(model, video_path, device="cuda"):
+    """단일 비디오 추론"""
+    model = model.to(device)
+    model.eval()
+    # 프레임 로드
+    frames = load_video_frames(video_path)
+    frames = frames.unsqueeze(0).to(device)  # [1, C, T, H, W]
+    # 추론
+    with torch.no_grad():
+        outputs = model(frames)
+        probs = torch.softmax(outputs, dim=1)
+        pred_idx = torch.argmax(probs, dim=1).item()
+        confidence = probs[0, pred_idx].item()
+    return {
+        "class_id": pred_idx,
+        "class_name_ko": CLASS_NAMES[pred_idx],
+        "class_name_en": CLASS_NAMES_EN[pred_idx],
+        "confidence": confidence,
+        "all_probabilities": {
+            CLASS_NAMES[i]: probs[0, i].item()
+            for i in range(len(CLASS_NAMES))
+        }
+    }
+# 사용 예시
+result = predict(model, "test_video.mp4")
+print(f"예측: {result['class_name_ko']} ({result['confidence']:.2%})")
+```
+---
+## 📹 Real-time Inference (실시간 추론)
+```python
+import cv2
+import torch
+import numpy as np
+from collections import deque
+class RealtimeDriverBehaviorDetector:
+    """실시간 운전자 이상행동 탐지기"""
+    CLASS_NAMES = ["정상", "졸음운전", "물건찾기", "휴대폰 사용", "운전자 폭행"]
+    def __init__(self, model_path, device="cuda", window_size=30, stride=15):
+        """
+        Args:
+            model_path: pytorch_model.bin 경로
+            device: 'cuda' 또는 'cpu'
+            window_size: 분석할 프레임 수 (기본 30 = 1초 @30fps)
+            stride: 슬라이딩 윈도우 간격 (기본 15 = 0.5초)
+        """
+        self.device = device
+        self.window_size = window_size
+        self.stride = stride
+        # 모델 로드
+        from torchvision.models.video import swin3d_t
+        self.model = swin3d_t(weights=None)
+        self.model.head = torch.nn.Linear(self.model.head.in_features, 5)
+        state_dict = torch.load(model_path, map_location="cpu", weights_only=True)
+        self.model.load_state_dict(state_dict)
+        self.model.to(device)
+        self.model.eval()
+        # 프레임 버퍼
+        self.frame_buffer = deque(maxlen=window_size)
+        self.frame_count = 0
+        # Normalization 파라미터
+        self.mean = np.array([0.485, 0.456, 0.406]).reshape(3, 1, 1, 1)
+        self.std = np.array([0.229, 0.224, 0.225]).reshape(3, 1, 1, 1)
+    def preprocess_frame(self, frame):
+        """단일 프레임 전처리"""
+        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+        frame = cv2.resize(frame, (224, 224))
+        return frame
+    def predict(self):
+        """현재 버퍼의 프레임으로 추론"""
+        if len(self.frame_buffer) < self.window_size:
+            return None
+        # [T, H, W, C] -> [C, T, H, W]
+        frames = np.array(list(self.frame_buffer), dtype=np.float32)
+        frames = frames.transpose(3, 0, 1, 2) / 255.0
+        frames = (frames - self.mean) / self.std
+        # 추론
+        with torch.no_grad():
+            inputs = torch.FloatTensor(frames).unsqueeze(0).to(self.device)
+            outputs = self.model(inputs)
+            probs = torch.softmax(outputs, dim=1)
+            pred_idx = torch.argmax(probs, dim=1).item()
+            confidence = probs[0, pred_idx].item()
+        return {
+            "class_id": pred_idx,
+            "class_name": self.CLASS_NAMES[pred_idx],
+            "confidence": confidence,
+            "is_abnormal": pred_idx != 0,  # 0 = 정상
+            "probabilities": probs[0].cpu().numpy()
+        }
+    def process_frame(self, frame):
+        """프레임 처리 (stride마다 추론)"""
+        processed = self.preprocess_frame(frame)
+        self.frame_buffer.append(processed)
+        self.frame_count += 1
+        # stride마다 추론
+        if self.frame_count % self.stride == 0:
+            return self.predict()
+        return None
+    def run_on_video(self, video_source=0, show_display=True):
+        """
+        비디오 소스에서 실시간 추론
+        Args:
+            video_source: 웹캠(0) 또는 비디오 파일 경로
+            show_display: 화면 출력 여부
+        """
+        cap = cv2.VideoCapture(video_source)
+        # 색상 정의 (BGR)
+        colors = {
+            "정상": (0, 255, 0),        # 초록
+            "졸음운전": (0, 165, 255),   # 주황
+            "물건찾기": (0, 255, 255),   # 노랑
+            "휴대폰 사용": (0, 0, 255),  # 빨강
+            "운전자 폭행": (255, 0, 255) # 보라
+        }
+        current_result = None
+        while True:
+            ret, frame = cap.read()
+            if not ret:
+                break
+            # 추론
+            result = self.process_frame(frame)
+            if result:
+                current_result = result
+            # 화면 출력
+            if show_display and current_result:
+                label = current_result["class_name"]
+                conf = current_result["confidence"]
+                color = colors.get(label, (255, 255, 255))
+                # 상태 표시
+                text = f"{label}: {conf:.1%}"
+                cv2.putText(frame, text, (10, 40),
+                           cv2.FONT_HERSHEY_SIMPLEX, 1.2, color, 3)
+                # 경고 (이상행동 탐지 시)
+                if current_result["is_abnormal"]:
+                    cv2.putText(frame, "WARNING!", (10, 80),
+                               cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0, 0, 255), 2)
+                cv2.imshow("Driver Behavior Detection", frame)
+            if cv2.waitKey(1) & 0xFF == ord('q'):
+                break
+        cap.release()
+        cv2.destroyAllWindows()
+# ===== 사용 예시 =====
+# 1. 웹캠 실시간 추론
+detector = RealtimeDriverBehaviorDetector("pytorch_model.bin", device="cuda")
+detector.run_on_video(video_source=0)  # 웹캠
+# 2. 비디오 파일 추론
+detector.run_on_video(video_source="test_video.mp4")
+```
+---
+## 🔧 Batch Inference (배치 추론)
+```python
+import torch
+from pathlib import Path
+from torch.utils.data import Dataset, DataLoader
+class VideoDataset(Dataset):
+    """비디오 파일 배치 처리용 Dataset"""
+    def __init__(self, video_paths, num_frames=30):
+        self.video_paths = video_paths
+        self.num_frames = num_frames
+        self.mean = np.array([0.485, 0.456, 0.406]).reshape(3, 1, 1, 1)
+        self.std = np.array([0.229, 0.224, 0.225]).reshape(3, 1, 1, 1)
+    def __len__(self):
+        return len(self.video_paths)
+    def __getitem__(self, idx):
+        video_path = self.video_paths[idx]
+        cap = cv2.VideoCapture(str(video_path))
+        frames = []
+        while len(frames) < self.num_frames:
+            ret, frame = cap.read()
+            if not ret:
+                break
+            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+            frame = cv2.resize(frame, (224, 224))
+            frames.append(frame)
+        cap.release()
+        while len(frames) < self.num_frames:
+            frames.append(frames[-1] if frames else np.zeros((224, 224, 3), dtype=np.uint8))
+        frames = np.array(frames[:self.num_frames], dtype=np.float32)
+        frames = frames.transpose(3, 0, 1, 2) / 255.0
+        frames = (frames - self.mean) / self.std
+        return torch.FloatTensor(frames), str(video_path)
+def batch_inference(model, video_folder, batch_size=8, device="cuda"):
+    """
+    폴더 내 모든 비디오 배치 추론
+    Args:
+        model: 로드된 모델
+        video_folder: 비디오 폴더 경로
+        batch_size: 배치 크기
+        device: 'cuda' 또는 'cpu'
+    Returns:
+        List of (video_path, prediction) tuples
+    """
+    CLASS_NAMES = ["정상", "졸음운전", "물건찾기", "휴대폰 사용", "운전자 폭행"]
+    video_folder = Path(video_folder)
+    video_paths = list(video_folder.glob("*.mp4")) + list(video_folder.glob("*.avi"))
+    dataset = VideoDataset(video_paths)
+    dataloader = DataLoader(dataset, batch_size=batch_size, num_workers=4)
+    model = model.to(device)
+    model.eval()
+    results = []
+    with torch.no_grad():
+        for frames, paths in dataloader:
+            frames = frames.to(device)
+            outputs = model(frames)
+            probs = torch.softmax(outputs, dim=1)
+            preds = torch.argmax(probs, dim=1)
+            for path, pred_idx, prob in zip(paths, preds, probs):
+                results.append({
+                    "video_path": path,
+                    "class_id": pred_idx.item(),
+                    "class_name": CLASS_NAMES[pred_idx.item()],
+                    "confidence": prob[pred_idx].item()
+                })
+    return results
+# 사용 예시
+results = batch_inference(model, "./videos/", batch_size=16)
+for r in results:
+    print(f"{r['video_path']}: {r['class_name']} ({r['confidence']:.2%})")
+```
+---
+## 📐 Input/Output Specification
+### Input Format
+| Parameter | Value |
+|-----------|-------|
+| **Shape** | `[batch, 3, 30, 224, 224]` |
+| **Format** | `[B, C, T, H, W]` (Batch, Channel, Time, Height, Width) |
+| **Channels** | RGB (not BGR) |
+| **Normalization** | ImageNet (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) |
+| **Value Range** | After normalization: approximately [-2.5, 2.5] |
+### Output Format
+| Parameter | Value |
+|-----------|-------|
+| **Shape** | `[batch, 5]` |
+| **Format** | Raw logits (use softmax for probabilities) |
+| **Classes** | 0=정상, 1=졸음운전, 2=물건찾기, 3=휴대폰사용, 4=운전자폭행 |
+---
+## ⚙️ Model Architecture
+```
+VideoSwinTransformer (swin3d_t)
+├── patch_embed: PatchEmbed3d
+│   └── proj: Conv3d(3, 96, kernel_size=(2,4,4), stride=(2,4,4))
+├── layers: Sequential
+│   ├── BasicLayer (depth=2, heads=3, dim=96)
+│   ├── BasicLayer (depth=2, heads=6, dim=192)
+│   ├── BasicLayer (depth=6, heads=12, dim=384)
+│   └── BasicLayer (depth=2, heads=24, dim=768)
+├── norm: LayerNorm(768)
+├── avgpool: AdaptiveAvgPool3d(1)
+└── head: Linear(768, 5)  # Modified for 5 classes
+Total Parameters: 27,855,851
+Trainable Parameters: 27,855,851
+```
+---
+## 🏋️ Training Details
+| Parameter | Value |
+|-----------|-------|
+| **Base Model** | swin3d_t (Kinetics-400 pretrained) |
+| **Framework** | PyTorch 2.0+ |
+| **GPUs** | 2x NVIDIA A6000 (48GB each) |
+| **Training Method** | DistributedDataParallel (DDP) |
+| **Batch Size** | 128 effective (16 per GPU × 2 GPUs × 4 accumulation) |
+| **Optimizer** | AdamW (lr=1e-3, weight_decay=1e-4) |
+| **Scheduler** | OneCycleLR (pct_start=0.2, anneal=cosine) |
+| **Mixed Precision** | FP16 (torch.amp) |
+| **Epochs** | 1 (of 5 total) |
+---
+## 📁 Dataset Information
+| Property | Value |
+|----------|-------|
+| **Name** | Korean Driver Behavior Dataset |
+| **Total Videos** | 243,979 |
+| **Total Samples** | 1,371,062 (sliding window) |
+| **Window Size** | 30 frames |
+| **Stride** | 15 frames |
+| **Resolution** | Various (resized to 224×224) |
+| **FPS** | 30 |
+### Class Distribution
+| Class | Samples | Percentage |
+|-------|---------|------------|
+| 정상 | 159,224 | 11.6% |
+| 졸음운전 | 619,450 | 45.2% |
+| 물건찾기 | 261,435 | 19.1% |
+| 휴대폰 사용 | 150,981 | 11.0% |
+| 운전자 폭행 | 179,972 | 13.1% |
+---
+## ⚠️ Limitations & Considerations
+1. **카메라 위치**: 운전석 정면 또는 측면 카메라에 최적화됨
+2. **조명 조건**: 야간/터널 등 저조도 환경에서 성능 저하 가능
+3. **가림 현상**: 선글라스, 마스크 착용 시 정확도 감소 가능
+4. **실시간 요구사항**: GPU 필요 (CPU에서는 느림)
+---
+## 📜 License
+Apache 2.0
+---
+## 🔗 Citation
+```bibtex
+@misc{driver-behavior-detection-2025,
+  title={Driver Abnormal Behavior Detection using Video Swin Transformer},
+  author={C-Team},
+  year={2025},
+  howpublished={\url{https://huggingface.co/YOUR_USERNAME/driver-behavior-swin-t}}
+}
+```
+---
+## 📞 Contact
+Issues and questions: [GitHub Issues](https://github.com/YOUR_USERNAME/driver-behavior-detection/issues)

config.json ADDED Viewed

	@@ -0,0 +1,52 @@

+{
+  "architectures": [
+    "VideoSwinTransformer"
+  ],
+  "model_type": "video-swin-transformer",
+  "backbone": "swin3d_t",
+  "pretrained_source": "kinetics400",
+  "num_classes": 5,
+  "class_names": [
+    "정상",
+    "졸음운전",
+    "물건찾기",
+    "휴대폰 사용",
+    "운전자 폭행"
+  ],
+  "input_size": {
+    "frames": 30,
+    "height": 224,
+    "width": 224,
+    "channels": 3
+  },
+  "input_format": "CTHW",
+  "training": {
+    "epochs_trained": 1,
+    "total_epochs": 5,
+    "batch_size": 16,
+    "effective_batch_size": 128,
+    "learning_rate": 0.001,
+    "optimizer": "AdamW",
+    "scheduler": "OneCycleLR",
+    "mixed_precision": true,
+    "gradient_accumulation_steps": 4
+  },
+  "metrics": {
+    "accuracy": 0.9551,
+    "macro_f1": 0.9436,
+    "per_class_f1": {
+      "정상": 0.92,
+      "졸음운전": 0.98,
+      "물건찾기": 0.92,
+      "휴대폰 사용": 0.9,
+      "운전자 폭행": 1.0
+    }
+  },
+  "dataset": {
+    "name": "Korean Driver Behavior Dataset",
+    "total_samples": 1371062,
+    "num_videos": 243979,
+    "sliding_window": 30,
+    "stride": 15
+  }
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dc7eb66a00e43a79a4db83cad13a36dc97b87d500a1a6f0bcec72779d22fdaf9
+size 126244047