|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- vision |
|
|
- image-segmentation |
|
|
- universal-segmentation |
|
|
- korean-road |
|
|
- oneformer |
|
|
- distillation |
|
|
- aihub |
|
|
model_name: koalaseg |
|
|
--- |
|
|
|
|
|
# KoalaSeg π¨π£οΈ |
|
|
|
|
|
## Colab Inference : |
|
|
[](https://colab.research.google.com/drive/1LXWqtv-7lba128iEzhgSwXtEzRpRF7I0?usp=sharing) |
|
|
|
|
|
_KOrean lAyered assistive Segmentation_ |
|
|
|
|
|
 |
|
|
|
|
|
νκ΅ λλ‘·보ν νκ²½ μ μ© **Universal Segmentation** λͺ¨λΈμ
λλ€. |
|
|
`shi-labs/oneformer_cityscapes_swin_large` κΈ°λ° OneFormer κ΅μ¬ λͺ¨λΈμ |
|
|
1. μμμ
XML ν΄λ¦¬κ³€ |
|
|
2. AIHUB λλ‘·보ννκ²½ Surface Mask(5k) + Polygon(500) λ°μ΄ν°λ‘ νμ΅ν νκ΅ν λͺ¨λΈ |
|
|
3. Cityscapes λ§μ€ν¬ |
|
|
μμΌλ‘ **λ μ΄μ΄λ μμλΈ**νμ¬ μμ±ν GTλ‘ Edge-ViT 20βM νμ λͺ¨λΈμ **μ¦λ₯**νμ΅λλ€. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Developed by**: Team RoadSight |
|
|
- **Base model**: `shi-labs/oneformer_cityscapes_swin_large` |
|
|
- **Model type**: Edge-ViT 20 M + OneFormer head (semantic task) |
|
|
- **Framework**: π€ Transformers & PyTorch |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Data |
|
|
|
|
|
AIHUB μΈλ·보ννκ²½ λ°μ΄ν° (https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=189): |
|
|
|
|
|
- **Bounding Box**: 350,000μ₯ (29μ’
μ₯μ λ¬Ό λ°μ€ μ΄λ
Έν
μ΄μ
) |
|
|
- **Polygon**: 100,000μ₯ (29μ’
μ₯μ λ¬Ό ν΄λ¦¬κ³€ μ΄λ
Έν
μ΄μ
) β **500μ₯ μ¬μ©** |
|
|
- **Surface Masking**: 50,000μ₯ (λ
Έλ©΄ μν λ§μ€ν¬) β **5,000μ₯ μ¬μ©** |
|
|
- **Depth Prediction**: 170,000μ₯ (μ€ν
λ μ€ κΉμ΄) |
|
|
|
|
|
μ΄ **18,369μ₯** (AIHUB 5.5k + μκ° μ΄¬μ 9k + Street View 3.7k) λ μ΄μ΄ μμλΈ β |
|
|
Morph Open/Close + MedianBlur(17px) ν GT μμ±. |
|
|
|
|
|
--- |
|
|
|
|
|
## Speeds & Sizes (512Γ512, batch=1) |
|
|
|
|
|
| Device | Baseline Cityscapes | Ensemble (3-layer) | Custom (K-Road) | **koalaseg** | |
|
|
|-----------------------|---------------------|--------------------|-----------------|--------------------| |
|
|
| **A100** | 3.58 s β 0.28 FPS | 3.74 s β 0.27 FPS | 0.15 s β 6.67 FPS | **0.14 s β 7.25 FPS** | |
|
|
| **T4** | 5.61 s β 0.18 FPS | 6.01 s β 0.17 FPS | 0.39 s β 2.60 FPS | **0.31 s β 3.27 FPS** | |
|
|
| **CPU (i9-12900K)** | 124 s β 0.008 FPS | 150 s β 0.007 FPS | 26.6 s β 0.038 FPS | **18.4 s β 0.054 FPS** | |
|
|
|
|
|
--- |
|
|
|
|
|
## Quick Start |
|
|
```python |
|
|
from transformers import AutoProcessor, AutoModelForUniversalSegmentation |
|
|
import torch, requests, matplotlib.pyplot as plt, numpy as np |
|
|
from PIL import Image |
|
|
from io import BytesIO |
|
|
|
|
|
# 0. Load model & processor ----------------------------------- |
|
|
model_id = "gj5520/KoalaSeg" |
|
|
proc = AutoProcessor.from_pretrained(model_id) |
|
|
model = AutoModelForUniversalSegmentation.from_pretrained(model_id).to("cuda").eval() |
|
|
|
|
|
# 1. Download image ------------------------------------------- |
|
|
url = "https://pds.joongang.co.kr/news/component/htmlphoto_mmdata/202205/21/1200738c-61c0-4a51-83c4-331f53d4dcdc.jpg" |
|
|
resp = requests.get(url, stream=True) |
|
|
img = Image.open(BytesIO(resp.content)).convert("RGB") |
|
|
|
|
|
# 2. Pre-process & inference ---------------------------------- |
|
|
inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda") |
|
|
with torch.no_grad(): |
|
|
out = model(**inputs) |
|
|
|
|
|
# 3-A. Get class-id map --------------------------------------- |
|
|
idmap = proc.post_process_semantic_segmentation( |
|
|
out, target_sizes=[img.size[::-1]] |
|
|
)[0].cpu().numpy() |
|
|
|
|
|
# 3-B. Convert idmap β RGB mask + overlay --------------------- |
|
|
cmap = plt.get_cmap("tab20", max(20, len(np.unique(idmap)))) |
|
|
mask_rgb = np.zeros((*idmap.shape, 3), dtype=np.uint8) |
|
|
for idx, cid in enumerate(np.unique(idmap)): |
|
|
if cid == 0: # keep background black |
|
|
continue |
|
|
mask_rgb[idmap == cid] = (np.array(cmap(idx)[:3]) * 255).astype(np.uint8) |
|
|
|
|
|
mask_img = Image.fromarray(mask_rgb) |
|
|
overlay = Image.blend(img, mask_img, alpha=0.6) # 0.6 β mask κ°μ‘° |
|
|
|
|
|
# 4. Show overlay --------------------------------------------- |
|
|
plt.figure(figsize=(8, 8)) |
|
|
plt.imshow(overlay) |
|
|
plt.axis("off") |
|
|
plt.show() |
|
|
``` |
|
|
|
|
|
|
|
|
## Intended Uses |
|
|
- μκ° μ₯μ μΈ λμ **λλ‘ μΈκ·Έλ©ν
μ΄μ
** |
|
|
- νκ΅ HD λ§΅Β·λλ‘ μ μ§λ³΄μ μ§μ |
|
|
- νμ Β·μ°κ΅¬ λͺ©μ μ νκ΅ν λ°μ΄ν°μ
λ²€μΉλ§ν¬ |
|
|
|
|
|
### Out-of-Scope |
|
|
- μλ£Β·μμ±Β·μ€λ΄ λ± λΉλλ‘ λλ©μΈ |
|
|
- κ°μΈ μλ³Β·κ°μ λ± λ―Όκ° μμ
|
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations & Risks |
|
|
- **νκ΅ λλ‘ μ μ©**: ν΄μΈΒ·κ·Ήμ μ‘°λΒ·νμ° λ± νκ²½μμ μ±λ₯ μ ν |
|
|
- λΆλΆ κ°λ¦Ό μΈμ²΄ κ°μ§ λΆμμ β 보쑰μ©μΌλ‘λ§ μ¬μ© |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
@misc{KoalaSeg2025, |
|
|
title = {KoalaSeg: Layered Distillation for Korean Road Universal Segmentation}, |
|
|
author = {RoadSight Team}, |
|
|
year = {2025}, |
|
|
url = {https://huggingface.co/gj5520/KoalaSeg} |
|
|
} |
|
|
|