File size: 7,924 Bytes

---
library_name: transformers
tags:
- image-classification
- multi-head-classification
- room-classification
- dinov2
- computer-vision
- scene-classification
license: apache-2.0
language:
- en
pipeline_tag: image-classification
base_model:
- facebook/dinov2-large
---

# Room Scene Classifier

DINOv2 기반 멀티헤드 호텔 이미지 장면 분류 모델입니다.

## 모델 개요

이 모델은 호텔 이미지를 **Scene(장면)**, **Concept(개념)**, **Object(객체)** 3가지 관점으로 동시에 분류하는 멀티헤드 딥러닝 모델입니다. DINOv2 백본을 사용하여 강력한 비전 특징을 추출하고, 각 헤드에서 특화된 분류를 수행합니다.

## 모델 정보

- **모델명**: `image_classifier_model_0.2`
- **기반 모델**: `facebook/dinov2-large`
- **이미지 크기**: 224x224
- **채널**: RGB (3채널)
- **총 파라미터**: 303,252,502개 (백본 고정)
- **훈련 가능 파라미터**: 24,598개

## 분류 헤드

### Scene 헤드 (6개 클래스)
- 객실, 욕실, 수영장, 로비, 레스토랑, 기타

### Concept 헤드 (3개 클래스)  
- 실내, 야외, 클로즈업

### Object 헤드 (13개 클래스)
- 침대, 소파, 샤워기, 욕조, 의자, 테이블, TV, 냉장고, 싱크대, 화장대, 거울, 기타, 미분류

## 사용법

### Python으로 모델 사용

```python
import torch
import onnxruntime as ort
import numpy as np
from PIL import Image
from torchvision import transforms
import json

# 모델 정보 로드
with open('image_classifier_model_0.2_model_info.json', 'r') as f:
    model_info = json.load(f)

# PyTorch 모델 로드
model = torch.load('image_classifier_model_0.2.pth', map_location='cpu')
model.eval()

# ONNX 모델 사용 (더 빠른 추론)
onnx_session = ort.InferenceSession('image_classifier_model_0.2.onnx')

# 이미지 전처리
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

def classify_image_pytorch(image_path):
    """PyTorch 모델을 사용한 이미지 분류"""
    image = transform(Image.open(image_path)).unsqueeze(0)
    
    with torch.no_grad():
        outputs = model(image)
        predictions = {}
        
        for head_name, logits in outputs.items():
            probabilities = torch.softmax(logits, dim=1)
            predicted_class = torch.argmax(probabilities, dim=1).item()
            confidence = probabilities[0, predicted_class].item()
            
            predictions[head_name] = {
                'class_id': predicted_class,
                'confidence': confidence,
                'probabilities': probabilities[0].tolist()
            }
    
    return predictions

def classify_image_onnx(image_path):
    """ONNX 모델을 사용한 이미지 분류 (권장)"""
    image = transform(Image.open(image_path)).numpy()
    
    # ONNX 모델 추론
    input_feed = {'input': image.astype(np.float32)}
    outputs = onnx_session.run(None, input_feed)
    
    predictions = {}
    head_names = ['scene', 'concept', 'object']
    
    for i, head_name in enumerate(head_names):
        logits = outputs[i]
        probabilities = torch.softmax(torch.tensor(logits), dim=1)
        predicted_class = torch.argmax(probabilities, dim=1).item()
        confidence = probabilities[0, predicted_class].item()
        
        predictions[head_name] = {
            'class_id': predicted_class,
            'confidence': confidence,
            'probabilities': probabilities[0].tolist()
        }
    
    return predictions

# 예시 사용
predictions = classify_image_onnx("hotel_room.jpg")
print("분류 결과:")
for head, result in predictions.items():
    print(f"{head}: 클래스 {result['class_id']}, 신뢰도 {result['confidence']:.4f}")
```

### 클래스 ID를 실제 클래스명으로 변환

```python
def get_class_names(predictions, model_info):
    """클래스 ID를 실제 클래스명으로 변환"""
    class_mappings = model_info['class_mappings']
    
    results = {}
    for head, result in predictions.items():
        class_id = result['class_id']
        if head in class_mappings:
            actual_class_id = class_mappings[head][str(class_id)]
            results[head] = {
                'class_id': actual_class_id,
                'confidence': result['confidence']
            }
    
    return results

# 클래스명 변환 예시
class_names = get_class_names(predictions, model_info)
print("실제 클래스 ID:")
for head, result in class_names.items():
    print(f"{head}: {result['class_id']}")
```

### 배치 처리

```python
def classify_batch_images(image_paths):
    """여러 이미지를 한 번에 분류"""
    results = []
    
    for image_path in image_paths:
        predictions = classify_image_onnx(image_path)
        results.append({
            'image_path': image_path,
            'predictions': predictions
        })
    
    return results

# 예시
image_paths = ["room1.jpg", "bathroom1.jpg", "lobby1.jpg"]
batch_results = classify_batch_images(image_paths)

for result in batch_results:
    print(f"\n이미지: {result['image_path']}")
    for head, pred in result['predictions'].items():
        print(f"  {head}: 클래스 {pred['class_id']}, 신뢰도 {pred['confidence']:.4f}")
```

## 모델 파일

- `image_classifier_model_0.2.pth`: PyTorch 모델 파일
- `image_classifier_model_0.2.onnx`: ONNX 모델 파일 (추론 최적화)
- `image_classifier_model_0.2_model_info.json`: 모델 메타데이터
- `image_classifier_model_0.2_inference_example.py`: 추론 예제 코드


## 모델 아키텍처

### 멀티헤드 분류 시스템
```
입력 이미지 (224×224)
    ↓
DINOv2 백본 (Frozen)
    ↓
공통 특징 (1024차원)
    ├─── Scene 헤드 → 6개 클래스
    ├─── Concept 헤드 → 3개 클래스
    └─── Object 헤드 → 13개 클래스
```

### 주요 특징
- **DINOv2 백본**: 강력한 비전 트랜스포머 기반 특징 추출
- **백본 고정**: 사전훈련된 특징을 활용하여 과적합 방지
- **멀티헤드**: 3개 헤드로 다각도 분석
- **클래스 가중치**: 불균형 데이터 자동 보정

## 전처리 요구사항

1. **이미지 크기**: 224x224 픽셀
2. **색상 공간**: RGB
3. **정규화**: ImageNet 표준값 사용 (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
4. **크롭**: 중앙 크롭 (center crop)
5. **지원 형식**: JPG, PNG, JPEG

## 사용 사례

### 직접 사용

- **호텔 이미지 자동 분류**: 객실, 욕실, 로비 등 장면별 자동 분류
- **이미지 메타데이터 생성**: 이미지의 장면, 개념, 객체 정보 자동 추출
- **이미지 데이터베이스 관리**: 대량의 호텔 이미지 자동 태깅
- **품질 관리**: 이미지 분류 일관성 검증

### 다운스트림 사용

- **호텔 관리 시스템**: 객실 이미지 자동 분류 및 관리
- **여행 플랫폼**: 객실 타입별 이미지 필터링
- **부동산 플랫폼**: 숙소 시설 정보 자동 추출
- **이미지 검색 엔진**: 다중 속성 기반 이미지 검색

## 제한사항

1. **도메인 특화**: 호텔/숙소 이미지에 특화되어 있어 다른 도메인에서는 성능이 제한적입니다.
2. **이미지 품질**: 저화질이나 노이즈가 많은 이미지에서는 성능이 저하될 수 있습니다.
3. **각도 의존성**: 특정 각도에서 촬영된 이미지에 대해 성능이 다를 수 있습니다.
4. **클래스 불균형**: 일부 클래스는 다른 클래스보다 성능이 낮을 수 있습니다.

## 라이선스

Apache 2.0 License

## 참고

이 모델은 Room Clusterer 프로젝트의 일부로 개발되었습니다. 더 자세한 정보는 [프로젝트 저장소](https://github.com/tportio/content-ml-trainer)를 참조하세요.