File size: 4,795 Bytes
ab0400c
 
03201a1
 
 
 
 
 
 
 
ecf7c22
ab0400c
 
2454881
56f8fc7
2454881
56f8fc7
 
7a8040a
ab0400c
ecf7c22
 
 
 
 
 
 
 
ab0400c
03201a1
ab0400c
 
ecf7c22
 
 
 
 
ab0400c
03201a1
ab0400c
03201a1
ecf7c22
 
 
 
 
 
 
 
 
 
ab0400c
03201a1
ab0400c
ecf7c22
ab0400c
ecf7c22
 
 
 
 
 
03201a1
ab0400c
03201a1
f6b0666
03201a1
7f15c52
03201a1
ecf7c22
ab0400c
7f15c52
ecf7c22
7f15c52
 
ab0400c
7f15c52
 
ecf7c22
 
 
7f15c52
03201a1
 
 
ab0400c
7f15c52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ecf7c22
7f15c52
ecf7c22
 
f6b0666
ecf7c22
 
 
f6b0666
ecf7c22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
library_name: transformers
tags:
  - vision
  - image-segmentation
  - universal-segmentation
  - korean-road
  - oneformer
  - distillation
  - aihub
model_name: koalaseg
---

# KoalaSeg πŸ¨πŸ›£οΈ  

## Colab Inference :  
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1LXWqtv-7lba128iEzhgSwXtEzRpRF7I0?usp=sharing)

_KOrean lAyered assistive Segmentation_ 

![Inference Demo](./overlay_ft_20250621_130747.png)  

ν•œκ΅­ λ„λ‘œΒ·λ³΄ν–‰ ν™˜κ²½ μ „μš© **Universal Segmentation** λͺ¨λΈμž…λ‹ˆλ‹€.  
`shi-labs/oneformer_cityscapes_swin_large` 기반 OneFormer ꡐ사 λͺ¨λΈμ„  
1. μˆ˜μž‘μ—… XML 폴리곀  
2. AIHUB λ„λ‘œΒ·λ³΄ν–‰ν™˜κ²½ Surface Mask(5k) + Polygon(500) λ°μ΄ν„°λ‘œ ν•™μŠ΅ν•œ ν•œκ΅­ν˜• λͺ¨λΈ  
3. Cityscapes 마슀크  
순으둜 **λ ˆμ΄μ–΄λ“œ 앙상블**ν•˜μ—¬ μƒμ„±ν•œ GT둜 Edge-ViT 20 M 학생 λͺ¨λΈμ„ **증λ₯˜**ν–ˆμŠ΅λ‹ˆλ‹€.

---

## Model Details

- **Developed by**: Team RoadSight  
- **Base model**: `shi-labs/oneformer_cityscapes_swin_large`  
- **Model type**: Edge-ViT 20 M + OneFormer head (semantic task)  
- **Framework**: πŸ€— Transformers & PyTorch

---

## Training Data

AIHUB μΈλ„Β·λ³΄ν–‰ν™˜κ²½ 데이터 (https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=189):

- **Bounding Box**: 350,000μž₯ (29μ’… μž₯μ• λ¬Ό λ°•μŠ€ μ–΄λ…Έν…Œμ΄μ…˜)  
- **Polygon**: 100,000μž₯ (29μ’… μž₯μ• λ¬Ό 폴리곀 μ–΄λ…Έν…Œμ΄μ…˜) β†’ **500μž₯ μ‚¬μš©**  
- **Surface Masking**: 50,000μž₯ (λ…Έλ©΄ μƒνƒœ 마슀크) β†’ **5,000μž₯ μ‚¬μš©**  
- **Depth Prediction**: 170,000μž₯ (μŠ€ν…Œλ ˆμ˜€ 깊이)

총 **18,369μž₯** (AIHUB 5.5k + μžκ°€ 촬영 9k + Street View 3.7k) λ ˆμ΄μ–΄ 앙상블 β†’  
Morph Open/Close + MedianBlur(17px) ν›„ GT 생성.

---

## Speeds & Sizes (512Γ—512, batch=1)

| Device                | Baseline Cityscapes | Ensemble (3-layer) | Custom (K-Road) | **koalaseg**       |
|-----------------------|---------------------|--------------------|-----------------|--------------------|
| **A100**              | 3.58 s β†’ 0.28 FPS   | 3.74 s β†’ 0.27 FPS  | 0.15 s β†’ 6.67 FPS | **0.14 s β†’ 7.25 FPS** |
| **T4**                | 5.61 s β†’ 0.18 FPS   | 6.01 s β†’ 0.17 FPS  | 0.39 s β†’ 2.60 FPS | **0.31 s β†’ 3.27 FPS** |
| **CPU (i9-12900K)**   | 124 s β†’ 0.008 FPS   | 150 s β†’ 0.007 FPS  | 26.6 s β†’ 0.038 FPS | **18.4 s β†’ 0.054 FPS** |

---

## Quick Start
```python
from transformers import AutoProcessor, AutoModelForUniversalSegmentation
import torch, requests, matplotlib.pyplot as plt, numpy as np
from PIL import Image
from io import BytesIO

# 0. Load model & processor -----------------------------------
model_id = "gj5520/KoalaSeg"
proc  = AutoProcessor.from_pretrained(model_id)
model = AutoModelForUniversalSegmentation.from_pretrained(model_id).to("cuda").eval()

# 1. Download image -------------------------------------------
url  = "https://pds.joongang.co.kr/news/component/htmlphoto_mmdata/202205/21/1200738c-61c0-4a51-83c4-331f53d4dcdc.jpg"
resp = requests.get(url, stream=True)
img  = Image.open(BytesIO(resp.content)).convert("RGB")

# 2. Pre-process & inference ----------------------------------
inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda")
with torch.no_grad():
    out = model(**inputs)

# 3-A. Get class-id map ---------------------------------------
idmap = proc.post_process_semantic_segmentation(
    out, target_sizes=[img.size[::-1]]
)[0].cpu().numpy()

# 3-B. Convert idmap β†’ RGB mask + overlay ---------------------
cmap      = plt.get_cmap("tab20", max(20, len(np.unique(idmap))))
mask_rgb  = np.zeros((*idmap.shape, 3), dtype=np.uint8)
for idx, cid in enumerate(np.unique(idmap)):
    if cid == 0:                  # keep background black
        continue
    mask_rgb[idmap == cid] = (np.array(cmap(idx)[:3]) * 255).astype(np.uint8)

mask_img = Image.fromarray(mask_rgb)
overlay  = Image.blend(img, mask_img, alpha=0.6)   # 0.6 β†’ mask κ°•μ‘°

# 4. Show overlay ---------------------------------------------
plt.figure(figsize=(8, 8))
plt.imshow(overlay)
plt.axis("off")
plt.show()
```


## Intended Uses
- μ‹œκ° μž₯애인 λŒ€μƒ **λ„λ‘œ μ„Έκ·Έλ©˜ν…Œμ΄μ…˜**
- ν•œκ΅­ HD λ§΅Β·λ„λ‘œ μœ μ§€λ³΄μˆ˜ 지원
- ν•™μˆ Β·μ—°κ΅¬ λͺ©μ μ˜ ν•œκ΅­ν˜• 데이터셋 벀치마크

### Out-of-Scope
- μ˜λ£ŒΒ·μœ„μ„±Β·μ‹€λ‚΄ λ“± λΉ„λ„λ‘œ 도메인
- 개인 μ‹λ³„Β·κ°μ‹œ λ“± 민감 μž‘μ—…

---

## Limitations & Risks
- **ν•œκ΅­ λ„λ‘œ μ „μš©**: ν•΄μ™ΈΒ·κ·Ήμ €μ‘°λ„Β·ν­μš° λ“± ν™˜κ²½μ—μ„œ μ„±λŠ₯ μ €ν•˜
- λΆ€λΆ„ κ°€λ¦Ό 인체 감지 λΆˆμ•ˆμ • β†’ 보쑰용으둜만 μ‚¬μš©

---

## Citation
@misc{KoalaSeg2025,
  title = {KoalaSeg: Layered Distillation for Korean Road Universal Segmentation},
  author = {RoadSight Team},
  year = {2025},
  url   = {https://huggingface.co/gj5520/KoalaSeg}
}