KoalaSeg / README.md
gj5520's picture
Update README.md
2454881 verified
---
library_name: transformers
tags:
- vision
- image-segmentation
- universal-segmentation
- korean-road
- oneformer
- distillation
- aihub
model_name: koalaseg
---
# KoalaSeg πŸ¨πŸ›£οΈ
## Colab Inference :
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1LXWqtv-7lba128iEzhgSwXtEzRpRF7I0?usp=sharing)
_KOrean lAyered assistive Segmentation_
![Inference Demo](./overlay_ft_20250621_130747.png)
ν•œκ΅­ λ„λ‘œΒ·λ³΄ν–‰ ν™˜κ²½ μ „μš© **Universal Segmentation** λͺ¨λΈμž…λ‹ˆλ‹€.
`shi-labs/oneformer_cityscapes_swin_large` 기반 OneFormer ꡐ사 λͺ¨λΈμ„
1. μˆ˜μž‘μ—… XML 폴리곀
2. AIHUB λ„λ‘œΒ·λ³΄ν–‰ν™˜κ²½ Surface Mask(5k) + Polygon(500) λ°μ΄ν„°λ‘œ ν•™μŠ΅ν•œ ν•œκ΅­ν˜• λͺ¨λΈ
3. Cityscapes 마슀크
순으둜 **λ ˆμ΄μ–΄λ“œ 앙상블**ν•˜μ—¬ μƒμ„±ν•œ GT둜 Edge-ViT 20 M 학생 λͺ¨λΈμ„ **증λ₯˜**ν–ˆμŠ΅λ‹ˆλ‹€.
---
## Model Details
- **Developed by**: Team RoadSight
- **Base model**: `shi-labs/oneformer_cityscapes_swin_large`
- **Model type**: Edge-ViT 20 M + OneFormer head (semantic task)
- **Framework**: πŸ€— Transformers & PyTorch
---
## Training Data
AIHUB μΈλ„Β·λ³΄ν–‰ν™˜κ²½ 데이터 (https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=189):
- **Bounding Box**: 350,000μž₯ (29μ’… μž₯μ• λ¬Ό λ°•μŠ€ μ–΄λ…Έν…Œμ΄μ…˜)
- **Polygon**: 100,000μž₯ (29μ’… μž₯μ• λ¬Ό 폴리곀 μ–΄λ…Έν…Œμ΄μ…˜) β†’ **500μž₯ μ‚¬μš©**
- **Surface Masking**: 50,000μž₯ (λ…Έλ©΄ μƒνƒœ 마슀크) β†’ **5,000μž₯ μ‚¬μš©**
- **Depth Prediction**: 170,000μž₯ (μŠ€ν…Œλ ˆμ˜€ 깊이)
총 **18,369μž₯** (AIHUB 5.5k + μžκ°€ 촬영 9k + Street View 3.7k) λ ˆμ΄μ–΄ 앙상블 β†’
Morph Open/Close + MedianBlur(17px) ν›„ GT 생성.
---
## Speeds & Sizes (512Γ—512, batch=1)
| Device | Baseline Cityscapes | Ensemble (3-layer) | Custom (K-Road) | **koalaseg** |
|-----------------------|---------------------|--------------------|-----------------|--------------------|
| **A100** | 3.58 s β†’ 0.28 FPS | 3.74 s β†’ 0.27 FPS | 0.15 s β†’ 6.67 FPS | **0.14 s β†’ 7.25 FPS** |
| **T4** | 5.61 s β†’ 0.18 FPS | 6.01 s β†’ 0.17 FPS | 0.39 s β†’ 2.60 FPS | **0.31 s β†’ 3.27 FPS** |
| **CPU (i9-12900K)** | 124 s β†’ 0.008 FPS | 150 s β†’ 0.007 FPS | 26.6 s β†’ 0.038 FPS | **18.4 s β†’ 0.054 FPS** |
---
## Quick Start
```python
from transformers import AutoProcessor, AutoModelForUniversalSegmentation
import torch, requests, matplotlib.pyplot as plt, numpy as np
from PIL import Image
from io import BytesIO
# 0. Load model & processor -----------------------------------
model_id = "gj5520/KoalaSeg"
proc = AutoProcessor.from_pretrained(model_id)
model = AutoModelForUniversalSegmentation.from_pretrained(model_id).to("cuda").eval()
# 1. Download image -------------------------------------------
url = "https://pds.joongang.co.kr/news/component/htmlphoto_mmdata/202205/21/1200738c-61c0-4a51-83c4-331f53d4dcdc.jpg"
resp = requests.get(url, stream=True)
img = Image.open(BytesIO(resp.content)).convert("RGB")
# 2. Pre-process & inference ----------------------------------
inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda")
with torch.no_grad():
out = model(**inputs)
# 3-A. Get class-id map ---------------------------------------
idmap = proc.post_process_semantic_segmentation(
out, target_sizes=[img.size[::-1]]
)[0].cpu().numpy()
# 3-B. Convert idmap β†’ RGB mask + overlay ---------------------
cmap = plt.get_cmap("tab20", max(20, len(np.unique(idmap))))
mask_rgb = np.zeros((*idmap.shape, 3), dtype=np.uint8)
for idx, cid in enumerate(np.unique(idmap)):
if cid == 0: # keep background black
continue
mask_rgb[idmap == cid] = (np.array(cmap(idx)[:3]) * 255).astype(np.uint8)
mask_img = Image.fromarray(mask_rgb)
overlay = Image.blend(img, mask_img, alpha=0.6) # 0.6 β†’ mask κ°•μ‘°
# 4. Show overlay ---------------------------------------------
plt.figure(figsize=(8, 8))
plt.imshow(overlay)
plt.axis("off")
plt.show()
```
## Intended Uses
- μ‹œκ° μž₯애인 λŒ€μƒ **λ„λ‘œ μ„Έκ·Έλ©˜ν…Œμ΄μ…˜**
- ν•œκ΅­ HD λ§΅Β·λ„λ‘œ μœ μ§€λ³΄μˆ˜ 지원
- ν•™μˆ Β·μ—°κ΅¬ λͺ©μ μ˜ ν•œκ΅­ν˜• 데이터셋 벀치마크
### Out-of-Scope
- μ˜λ£ŒΒ·μœ„μ„±Β·μ‹€λ‚΄ λ“± λΉ„λ„λ‘œ 도메인
- 개인 μ‹λ³„Β·κ°μ‹œ λ“± 민감 μž‘μ—…
---
## Limitations & Risks
- **ν•œκ΅­ λ„λ‘œ μ „μš©**: ν•΄μ™ΈΒ·κ·Ήμ €μ‘°λ„Β·ν­μš° λ“± ν™˜κ²½μ—μ„œ μ„±λŠ₯ μ €ν•˜
- λΆ€λΆ„ κ°€λ¦Ό 인체 감지 λΆˆμ•ˆμ • β†’ 보쑰용으둜만 μ‚¬μš©
---
## Citation
@misc{KoalaSeg2025,
title = {KoalaSeg: Layered Distillation for Korean Road Universal Segmentation},
author = {RoadSight Team},
year = {2025},
url = {https://huggingface.co/gj5520/KoalaSeg}
}