File size: 4,795 Bytes

ab0400c
 
03201a1
 
 
 
 
 
 
 
ecf7c22
ab0400c
 
2454881
56f8fc7
2454881
56f8fc7
 
7a8040a
ab0400c
ecf7c22
 
 
 
 
 
 
 
ab0400c
03201a1
ab0400c
 
ecf7c22
 
 
 
 
ab0400c
03201a1
ab0400c
03201a1
ecf7c22
 
 
 
 
 
 
 
 
 
ab0400c
03201a1
ab0400c
ecf7c22
ab0400c
ecf7c22
 
 
 
 
 
03201a1
ab0400c
03201a1
f6b0666
03201a1
7f15c52
03201a1
ecf7c22
ab0400c
7f15c52
ecf7c22
7f15c52
 
ab0400c
7f15c52
 
ecf7c22
 
 
7f15c52
03201a1
 
 
ab0400c
7f15c52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ecf7c22
7f15c52
ecf7c22
 
f6b0666
ecf7c22
 
 
f6b0666
ecf7c22

---
library_name: transformers
tags:
  - vision
  - image-segmentation
  - universal-segmentation
  - korean-road
  - oneformer
  - distillation
  - aihub
model_name: koalaseg
---

# KoalaSeg 🐨🛣️  

## Colab Inference :  
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1LXWqtv-7lba128iEzhgSwXtEzRpRF7I0?usp=sharing)

_KOrean lAyered assistive Segmentation_ 

![Inference Demo](./overlay_ft_20250621_130747.png)  

한국 도로·보행 환경 전용 **Universal Segmentation** 모델입니다.  
`shi-labs/oneformer_cityscapes_swin_large` 기반 OneFormer 교사 모델을  
1. 수작업 XML 폴리곤  
2. AIHUB 도로·보행환경 Surface Mask(5k) + Polygon(500) 데이터로 학습한 한국형 모델  
3. Cityscapes 마스크  
순으로 **레이어드 앙상블**하여 생성한 GT로 Edge-ViT 20 M 학생 모델을 **증류**했습니다.

---

## Model Details

- **Developed by**: Team RoadSight  
- **Base model**: `shi-labs/oneformer_cityscapes_swin_large`  
- **Model type**: Edge-ViT 20 M + OneFormer head (semantic task)  
- **Framework**: 🤗 Transformers & PyTorch

---

## Training Data

AIHUB 인도·보행환경 데이터 (https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=189):

- **Bounding Box**: 350,000장 (29종 장애물 박스 어노테이션)  
- **Polygon**: 100,000장 (29종 장애물 폴리곤 어노테이션) → **500장 사용**  
- **Surface Masking**: 50,000장 (노면 상태 마스크) → **5,000장 사용**  
- **Depth Prediction**: 170,000장 (스테레오 깊이)

총 **18,369장** (AIHUB 5.5k + 자가 촬영 9k + Street View 3.7k) 레이어 앙상블 →  
Morph Open/Close + MedianBlur(17px) 후 GT 생성.

---

## Speeds & Sizes (512×512, batch=1)

| Device                | Baseline Cityscapes | Ensemble (3-layer) | Custom (K-Road) | **koalaseg**       |
|-----------------------|---------------------|--------------------|-----------------|--------------------|
| **A100**              | 3.58 s → 0.28 FPS   | 3.74 s → 0.27 FPS  | 0.15 s → 6.67 FPS | **0.14 s → 7.25 FPS** |
| **T4**                | 5.61 s → 0.18 FPS   | 6.01 s → 0.17 FPS  | 0.39 s → 2.60 FPS | **0.31 s → 3.27 FPS** |
| **CPU (i9-12900K)**   | 124 s → 0.008 FPS   | 150 s → 0.007 FPS  | 26.6 s → 0.038 FPS | **18.4 s → 0.054 FPS** |

---

## Quick Start
```python
from transformers import AutoProcessor, AutoModelForUniversalSegmentation
import torch, requests, matplotlib.pyplot as plt, numpy as np
from PIL import Image
from io import BytesIO

# 0. Load model & processor -----------------------------------
model_id = "gj5520/KoalaSeg"
proc  = AutoProcessor.from_pretrained(model_id)
model = AutoModelForUniversalSegmentation.from_pretrained(model_id).to("cuda").eval()

# 1. Download image -------------------------------------------
url  = "https://pds.joongang.co.kr/news/component/htmlphoto_mmdata/202205/21/1200738c-61c0-4a51-83c4-331f53d4dcdc.jpg"
resp = requests.get(url, stream=True)
img  = Image.open(BytesIO(resp.content)).convert("RGB")

# 2. Pre-process & inference ----------------------------------
inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda")
with torch.no_grad():
    out = model(**inputs)

# 3-A. Get class-id map ---------------------------------------
idmap = proc.post_process_semantic_segmentation(
    out, target_sizes=[img.size[::-1]]
)[0].cpu().numpy()

# 3-B. Convert idmap → RGB mask + overlay ---------------------
cmap      = plt.get_cmap("tab20", max(20, len(np.unique(idmap))))
mask_rgb  = np.zeros((*idmap.shape, 3), dtype=np.uint8)
for idx, cid in enumerate(np.unique(idmap)):
    if cid == 0:                  # keep background black
        continue
    mask_rgb[idmap == cid] = (np.array(cmap(idx)[:3]) * 255).astype(np.uint8)

mask_img = Image.fromarray(mask_rgb)
overlay  = Image.blend(img, mask_img, alpha=0.6)   # 0.6 → mask 강조

# 4. Show overlay ---------------------------------------------
plt.figure(figsize=(8, 8))
plt.imshow(overlay)
plt.axis("off")
plt.show()
```


## Intended Uses
- 시각 장애인 대상 **도로 세그멘테이션**
- 한국 HD 맵·도로 유지보수 지원
- 학술·연구 목적의 한국형 데이터셋 벤치마크

### Out-of-Scope
- 의료·위성·실내 등 비도로 도메인
- 개인 식별·감시 등 민감 작업

---

## Limitations & Risks
- **한국 도로 전용**: 해외·극저조도·폭우 등 환경에서 성능 저하
- 부분 가림 인체 감지 불안정 → 보조용으로만 사용

---

## Citation
@misc{KoalaSeg2025,
  title = {KoalaSeg: Layered Distillation for Korean Road Universal Segmentation},
  author = {RoadSight Team},
  year = {2025},
  url   = {https://huggingface.co/gj5520/KoalaSeg}
}