---
library_name: transformers
tags:
  - vision
  - image-segmentation
  - universal-segmentation
  - korean-road
  - oneformer
  - distillation
  - aihub
model_name: koalaseg
---

# KoalaSeg 🐨🛣️  

## Colab Inference :  
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1LXWqtv-7lba128iEzhgSwXtEzRpRF7I0?usp=sharing)

_KOrean lAyered assistive Segmentation_ 

![Inference Demo](./overlay_ft_20250621_130747.png)  

한국 도로·보행 환경 전용 **Universal Segmentation** 모델입니다.  
`shi-labs/oneformer_cityscapes_swin_large` 기반 OneFormer 교사 모델을  
1. 수작업 XML 폴리곤  
2. AIHUB 도로·보행환경 Surface Mask(5k) + Polygon(500) 데이터로 학습한 한국형 모델  
3. Cityscapes 마스크  
순으로 **레이어드 앙상블**하여 생성한 GT로 Edge-ViT 20 M 학생 모델을 **증류**했습니다.

---

## Model Details

- **Developed by**: Team RoadSight  
- **Base model**: `shi-labs/oneformer_cityscapes_swin_large`  
- **Model type**: Edge-ViT 20 M + OneFormer head (semantic task)  
- **Framework**: 🤗 Transformers & PyTorch

---

## Training Data

AIHUB 인도·보행환경 데이터 (https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=189):

- **Bounding Box**: 350,000장 (29종 장애물 박스 어노테이션)  
- **Polygon**: 100,000장 (29종 장애물 폴리곤 어노테이션) → **500장 사용**  
- **Surface Masking**: 50,000장 (노면 상태 마스크) → **5,000장 사용**  
- **Depth Prediction**: 170,000장 (스테레오 깊이)

총 **18,369장** (AIHUB 5.5k + 자가 촬영 9k + Street View 3.7k) 레이어 앙상블 →  
Morph Open/Close + MedianBlur(17px) 후 GT 생성.

---

## Speeds & Sizes (512×512, batch=1)

| Device                | Baseline Cityscapes | Ensemble (3-layer) | Custom (K-Road) | **koalaseg**       |
|-----------------------|---------------------|--------------------|-----------------|--------------------|
| **A100**              | 3.58 s → 0.28 FPS   | 3.74 s → 0.27 FPS  | 0.15 s → 6.67 FPS | **0.14 s → 7.25 FPS** |
| **T4**                | 5.61 s → 0.18 FPS   | 6.01 s → 0.17 FPS  | 0.39 s → 2.60 FPS | **0.31 s → 3.27 FPS** |
| **CPU (i9-12900K)**   | 124 s → 0.008 FPS   | 150 s → 0.007 FPS  | 26.6 s → 0.038 FPS | **18.4 s → 0.054 FPS** |

---

## Quick Start
```python
from transformers import AutoProcessor, AutoModelForUniversalSegmentation
import torch, requests, matplotlib.pyplot as plt, numpy as np
from PIL import Image
from io import BytesIO

# 0. Load model & processor -----------------------------------
model_id = "gj5520/KoalaSeg"
proc  = AutoProcessor.from_pretrained(model_id)
model = AutoModelForUniversalSegmentation.from_pretrained(model_id).to("cuda").eval()

# 1. Download image -------------------------------------------
url  = "https://pds.joongang.co.kr/news/component/htmlphoto_mmdata/202205/21/1200738c-61c0-4a51-83c4-331f53d4dcdc.jpg"
resp = requests.get(url, stream=True)
img  = Image.open(BytesIO(resp.content)).convert("RGB")

# 2. Pre-process & inference ----------------------------------
inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda")
with torch.no_grad():
    out = model(**inputs)

# 3-A. Get class-id map ---------------------------------------
idmap = proc.post_process_semantic_segmentation(
    out, target_sizes=[img.size[::-1]]
)[0].cpu().numpy()

# 3-B. Convert idmap → RGB mask + overlay ---------------------
cmap      = plt.get_cmap("tab20", max(20, len(np.unique(idmap))))
mask_rgb  = np.zeros((*idmap.shape, 3), dtype=np.uint8)
for idx, cid in enumerate(np.unique(idmap)):
    if cid == 0:                  # keep background black
        continue
    mask_rgb[idmap == cid] = (np.array(cmap(idx)[:3]) * 255).astype(np.uint8)

mask_img = Image.fromarray(mask_rgb)
overlay  = Image.blend(img, mask_img, alpha=0.6)   # 0.6 → mask 강조

# 4. Show overlay ---------------------------------------------
plt.figure(figsize=(8, 8))
plt.imshow(overlay)
plt.axis("off")
plt.show()
```


## Intended Uses
- 시각 장애인 대상 **도로 세그멘테이션**
- 한국 HD 맵·도로 유지보수 지원
- 학술·연구 목적의 한국형 데이터셋 벤치마크

### Out-of-Scope
- 의료·위성·실내 등 비도로 도메인
- 개인 식별·감시 등 민감 작업

---

## Limitations & Risks
- **한국 도로 전용**: 해외·극저조도·폭우 등 환경에서 성능 저하
- 부분 가림 인체 감지 불안정 → 보조용으로만 사용

---

## Citation
@misc{KoalaSeg2025,
  title = {KoalaSeg: Layered Distillation for Korean Road Universal Segmentation},
  author = {RoadSight Team},
  year = {2025},
  url   = {https://huggingface.co/gj5520/KoalaSeg}
}