File size: 4,795 Bytes
ab0400c 03201a1 ecf7c22 ab0400c 2454881 56f8fc7 2454881 56f8fc7 7a8040a ab0400c ecf7c22 ab0400c 03201a1 ab0400c ecf7c22 ab0400c 03201a1 ab0400c 03201a1 ecf7c22 ab0400c 03201a1 ab0400c ecf7c22 ab0400c ecf7c22 03201a1 ab0400c 03201a1 f6b0666 03201a1 7f15c52 03201a1 ecf7c22 ab0400c 7f15c52 ecf7c22 7f15c52 ab0400c 7f15c52 ecf7c22 7f15c52 03201a1 ab0400c 7f15c52 ecf7c22 7f15c52 ecf7c22 f6b0666 ecf7c22 f6b0666 ecf7c22 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | ---
library_name: transformers
tags:
- vision
- image-segmentation
- universal-segmentation
- korean-road
- oneformer
- distillation
- aihub
model_name: koalaseg
---
# KoalaSeg π¨π£οΈ
## Colab Inference :
[](https://colab.research.google.com/drive/1LXWqtv-7lba128iEzhgSwXtEzRpRF7I0?usp=sharing)
_KOrean lAyered assistive Segmentation_

νκ΅ λλ‘·보ν νκ²½ μ μ© **Universal Segmentation** λͺ¨λΈμ
λλ€.
`shi-labs/oneformer_cityscapes_swin_large` κΈ°λ° OneFormer κ΅μ¬ λͺ¨λΈμ
1. μμμ
XML ν΄λ¦¬κ³€
2. AIHUB λλ‘·보ννκ²½ Surface Mask(5k) + Polygon(500) λ°μ΄ν°λ‘ νμ΅ν νκ΅ν λͺ¨λΈ
3. Cityscapes λ§μ€ν¬
μμΌλ‘ **λ μ΄μ΄λ μμλΈ**νμ¬ μμ±ν GTλ‘ Edge-ViT 20βM νμ λͺ¨λΈμ **μ¦λ₯**νμ΅λλ€.
---
## Model Details
- **Developed by**: Team RoadSight
- **Base model**: `shi-labs/oneformer_cityscapes_swin_large`
- **Model type**: Edge-ViT 20 M + OneFormer head (semantic task)
- **Framework**: π€ Transformers & PyTorch
---
## Training Data
AIHUB μΈλ·보ννκ²½ λ°μ΄ν° (https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=189):
- **Bounding Box**: 350,000μ₯ (29μ’
μ₯μ λ¬Ό λ°μ€ μ΄λ
Έν
μ΄μ
)
- **Polygon**: 100,000μ₯ (29μ’
μ₯μ λ¬Ό ν΄λ¦¬κ³€ μ΄λ
Έν
μ΄μ
) β **500μ₯ μ¬μ©**
- **Surface Masking**: 50,000μ₯ (λ
Έλ©΄ μν λ§μ€ν¬) β **5,000μ₯ μ¬μ©**
- **Depth Prediction**: 170,000μ₯ (μ€ν
λ μ€ κΉμ΄)
μ΄ **18,369μ₯** (AIHUB 5.5k + μκ° μ΄¬μ 9k + Street View 3.7k) λ μ΄μ΄ μμλΈ β
Morph Open/Close + MedianBlur(17px) ν GT μμ±.
---
## Speeds & Sizes (512Γ512, batch=1)
| Device | Baseline Cityscapes | Ensemble (3-layer) | Custom (K-Road) | **koalaseg** |
|-----------------------|---------------------|--------------------|-----------------|--------------------|
| **A100** | 3.58 s β 0.28 FPS | 3.74 s β 0.27 FPS | 0.15 s β 6.67 FPS | **0.14 s β 7.25 FPS** |
| **T4** | 5.61 s β 0.18 FPS | 6.01 s β 0.17 FPS | 0.39 s β 2.60 FPS | **0.31 s β 3.27 FPS** |
| **CPU (i9-12900K)** | 124 s β 0.008 FPS | 150 s β 0.007 FPS | 26.6 s β 0.038 FPS | **18.4 s β 0.054 FPS** |
---
## Quick Start
```python
from transformers import AutoProcessor, AutoModelForUniversalSegmentation
import torch, requests, matplotlib.pyplot as plt, numpy as np
from PIL import Image
from io import BytesIO
# 0. Load model & processor -----------------------------------
model_id = "gj5520/KoalaSeg"
proc = AutoProcessor.from_pretrained(model_id)
model = AutoModelForUniversalSegmentation.from_pretrained(model_id).to("cuda").eval()
# 1. Download image -------------------------------------------
url = "https://pds.joongang.co.kr/news/component/htmlphoto_mmdata/202205/21/1200738c-61c0-4a51-83c4-331f53d4dcdc.jpg"
resp = requests.get(url, stream=True)
img = Image.open(BytesIO(resp.content)).convert("RGB")
# 2. Pre-process & inference ----------------------------------
inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda")
with torch.no_grad():
out = model(**inputs)
# 3-A. Get class-id map ---------------------------------------
idmap = proc.post_process_semantic_segmentation(
out, target_sizes=[img.size[::-1]]
)[0].cpu().numpy()
# 3-B. Convert idmap β RGB mask + overlay ---------------------
cmap = plt.get_cmap("tab20", max(20, len(np.unique(idmap))))
mask_rgb = np.zeros((*idmap.shape, 3), dtype=np.uint8)
for idx, cid in enumerate(np.unique(idmap)):
if cid == 0: # keep background black
continue
mask_rgb[idmap == cid] = (np.array(cmap(idx)[:3]) * 255).astype(np.uint8)
mask_img = Image.fromarray(mask_rgb)
overlay = Image.blend(img, mask_img, alpha=0.6) # 0.6 β mask κ°μ‘°
# 4. Show overlay ---------------------------------------------
plt.figure(figsize=(8, 8))
plt.imshow(overlay)
plt.axis("off")
plt.show()
```
## Intended Uses
- μκ° μ₯μ μΈ λμ **λλ‘ μΈκ·Έλ©ν
μ΄μ
**
- νκ΅ HD λ§΅Β·λλ‘ μ μ§λ³΄μ μ§μ
- νμ Β·μ°κ΅¬ λͺ©μ μ νκ΅ν λ°μ΄ν°μ
λ²€μΉλ§ν¬
### Out-of-Scope
- μλ£Β·μμ±Β·μ€λ΄ λ± λΉλλ‘ λλ©μΈ
- κ°μΈ μλ³Β·κ°μ λ± λ―Όκ° μμ
---
## Limitations & Risks
- **νκ΅ λλ‘ μ μ©**: ν΄μΈΒ·κ·Ήμ μ‘°λΒ·νμ° λ± νκ²½μμ μ±λ₯ μ ν
- λΆλΆ κ°λ¦Ό μΈμ²΄ κ°μ§ λΆμμ β 보쑰μ©μΌλ‘λ§ μ¬μ©
---
## Citation
@misc{KoalaSeg2025,
title = {KoalaSeg: Layered Distillation for Korean Road Universal Segmentation},
author = {RoadSight Team},
year = {2025},
url = {https://huggingface.co/gj5520/KoalaSeg}
}
|