Update README.md
Browse files
README.md
CHANGED
|
@@ -8,71 +8,117 @@ tags:
|
|
| 8 |
- oneformer
|
| 9 |
- distillation
|
| 10 |
- aihub
|
| 11 |
-
|
| 12 |
-
model_name: KoalaSeg-Edge-ViT
|
| 13 |
---
|
| 14 |
|
| 15 |
-
#
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
---
|
| 22 |
|
| 23 |
## Model Details
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
| **Framework** | π€ Transformers v4.41 / PyTorch 2.3 |
|
| 30 |
-
| **License** | CC BY 4.0 |
|
| 31 |
|
| 32 |
---
|
| 33 |
|
| 34 |
## Training Data
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
---
|
| 43 |
|
| 44 |
-
## Speeds & Sizes
|
| 45 |
|
| 46 |
-
| Device
|
| 47 |
-
|
| 48 |
-
| **A100**
|
| 49 |
-
| **T4**
|
| 50 |
-
| **CPU (i9-12900K)**
|
|
|
|
|
|
|
| 51 |
|
| 52 |
---
|
| 53 |
|
| 54 |
## Evaluation (κ΅λ΄ ν
μ€νΈμ
)
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
|
|
|
|
| 59 |
|
| 60 |
---
|
| 61 |
|
| 62 |
## Quick Start
|
| 63 |
-
```python
|
| 64 |
from transformers import AutoProcessor, AutoModelForUniversalSegmentation
|
| 65 |
-
import torch,
|
| 66 |
from PIL import Image
|
|
|
|
| 67 |
|
| 68 |
-
model_id = "
|
| 69 |
-
proc
|
| 70 |
-
model
|
| 71 |
|
| 72 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda")
|
| 74 |
with torch.no_grad():
|
| 75 |
out = model(**inputs)
|
| 76 |
|
|
|
|
| 77 |
idmap = proc.post_process_semantic_segmentation(out, target_sizes=[img.size[::-1]])[0]
|
| 78 |
-
plt.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
- oneformer
|
| 9 |
- distillation
|
| 10 |
- aihub
|
| 11 |
+
model_name: koalaseg
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# koalaseg π¨π£οΈ
|
| 15 |
+
_KOrean lAyered assistive Segmentation_
|
| 16 |
|
| 17 |
+

|
| 18 |
+
|
| 19 |
+
νκ΅ λλ‘·보ν νκ²½ μ μ© **Universal Segmentation** λͺ¨λΈμ
λλ€.
|
| 20 |
+
`shi-labs/oneformer_cityscapes_swin_large` κΈ°λ° OneFormer κ΅μ¬ λͺ¨λΈμ
|
| 21 |
+
1. μμμ
XML ν΄λ¦¬κ³€
|
| 22 |
+
2. AIHUB λλ‘·보ννκ²½ Surface Mask(5k) + Polygon(500) λ°μ΄ν°λ‘ νμ΅ν νκ΅ν λͺ¨λΈ
|
| 23 |
+
3. Cityscapes λ§μ€ν¬
|
| 24 |
+
μμΌλ‘ **λ μ΄μ΄λ μμλΈ**νμ¬ μμ±ν GTλ‘ Edge-ViT 20βM νμ λͺ¨λΈμ **μ¦λ₯**νμ΅λλ€.
|
| 25 |
|
| 26 |
---
|
| 27 |
|
| 28 |
## Model Details
|
| 29 |
+
|
| 30 |
+
- **Developed by**: Team RoadSight
|
| 31 |
+
- **Base model**: `shi-labs/oneformer_cityscapes_swin_large`
|
| 32 |
+
- **Model type**: Edge-ViT 20 M + OneFormer head (semantic task)
|
| 33 |
+
- **Framework**: π€ Transformers & PyTorch
|
|
|
|
|
|
|
| 34 |
|
| 35 |
---
|
| 36 |
|
| 37 |
## Training Data
|
| 38 |
+
|
| 39 |
+
AIHUB μΈλ·보ννκ²½ λ°μ΄ν° (https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=189):
|
| 40 |
+
|
| 41 |
+
- **Bounding Box**: 350,000μ₯ (29μ’
μ₯μ λ¬Ό λ°μ€ μ΄λ
Έν
μ΄μ
)
|
| 42 |
+
- **Polygon**: 100,000μ₯ (29μ’
μ₯μ λ¬Ό ν΄λ¦¬κ³€ μ΄λ
Έν
μ΄μ
) β **500μ₯ μ¬μ©**
|
| 43 |
+
- **Surface Masking**: 50,000μ₯ (λ
Έλ©΄ μν λ§μ€ν¬) β **5,000μ₯ μ¬μ©**
|
| 44 |
+
- **Depth Prediction**: 170,000μ₯ (μ€ν
λ μ€ κΉμ΄)
|
| 45 |
+
|
| 46 |
+
μ΄ **18,369μ₯** (AIHUB 5.5k + μκ° μ΄¬μ 9k + Street View 3.7k) λ μ΄μ΄ μμλΈ β
|
| 47 |
+
Morph Open/Close + MedianBlur(17px) ν GT μμ±.
|
| 48 |
|
| 49 |
---
|
| 50 |
|
| 51 |
+
## Speeds & Sizes (512Γ512, batch=1)
|
| 52 |
|
| 53 |
+
| Device | Baseline Cityscapes | Ensemble (3-layer) | Custom (K-Road) | **koalaseg** |
|
| 54 |
+
|-----------------------|---------------------|--------------------|-----------------|--------------------|
|
| 55 |
+
| **A100** | 3.58 s β 0.28 FPS | 3.74 s β 0.27 FPS | 0.15 s β 6.67 FPS | **0.14 s β 7.25 FPS** |
|
| 56 |
+
| **T4** | 5.61 s β 0.18 FPS | 6.01 s β 0.17 FPS | 0.39 s β 2.60 FPS | **0.31 s β 3.27 FPS** |
|
| 57 |
+
| **CPU (i9-12900K)** | 124 s β 0.008 FPS | 150 s β 0.007 FPS | 26.6 s β 0.038 FPS | **18.4 s β 0.054 FPS** |
|
| 58 |
+
|
| 59 |
+
λͺ¨λΈ ν¬κΈ°: 83 MB (INT8 μμν)
|
| 60 |
|
| 61 |
---
|
| 62 |
|
| 63 |
## Evaluation (κ΅λ΄ ν
μ€νΈμ
)
|
| 64 |
+
|
| 65 |
+
| Metric | Baseline | **koalaseg** |
|
| 66 |
+
|-----------------------|----------|--------------|
|
| 67 |
+
| mIoU (μ 체 ν΄λμ€) | 0.55 | **0.81** |
|
| 68 |
+
| F1 β λλ‘ vs μΈλ | 0.58 | **0.89** |
|
| 69 |
|
| 70 |
---
|
| 71 |
|
| 72 |
## Quick Start
|
|
|
|
| 73 |
from transformers import AutoProcessor, AutoModelForUniversalSegmentation
|
| 74 |
+
import torch, requests, matplotlib.pyplot as plt
|
| 75 |
from PIL import Image
|
| 76 |
+
from io import BytesIO
|
| 77 |
|
| 78 |
+
model_id = "gj5520/KoalaSeg"
|
| 79 |
+
proc = AutoProcessor.from_pretrained(model_id)
|
| 80 |
+
model = AutoModelForUniversalSegmentation.from_pretrained(model_id).to("cuda")
|
| 81 |
|
| 82 |
+
# 1. μ΄λ―Έμ§ λ€μ΄λ‘λ
|
| 83 |
+
url = "https://pds.joongang.co.kr/news/component/htmlphoto_mmdata/202205/21/1200738c-61c0-4a51-83c4-331f53d4dcdc.jpg"
|
| 84 |
+
resp = requests.get(url, stream=True)
|
| 85 |
+
img = Image.open(BytesIO(resp.content)).convert("RGB")
|
| 86 |
+
|
| 87 |
+
# 2. μ μ²λ¦¬ + λͺ¨λΈ μΆλ‘
|
| 88 |
inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda")
|
| 89 |
with torch.no_grad():
|
| 90 |
out = model(**inputs)
|
| 91 |
|
| 92 |
+
# 3. νμ²λ¦¬ λ° μκ°ν
|
| 93 |
idmap = proc.post_process_semantic_segmentation(out, target_sizes=[img.size[::-1]])[0]
|
| 94 |
+
plt.figure(figsize=(8, 8))
|
| 95 |
+
plt.imshow(idmap.cpu(), cmap="tab20")
|
| 96 |
+
plt.axis("off")
|
| 97 |
+
plt.show()
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
## Intended Uses
|
| 101 |
+
- μκ° μ₯μ μΈ λμ **μ€μκ° λλ‘ μΈκ·Έλ©ν
μ΄μ
**
|
| 102 |
+
- νκ΅ HD λ§΅Β·λλ‘ μ μ§λ³΄μ μ§μ
|
| 103 |
+
- νμ Β·μ°κ΅¬ λͺ©μ μ νκ΅ν λ°μ΄ν°μ
λ²€μΉλ§ν¬
|
| 104 |
+
|
| 105 |
+
### Out-of-Scope
|
| 106 |
+
- μλ£Β·μμ±Β·μ€λ΄ λ± λΉλλ‘ λλ©μΈ
|
| 107 |
+
- κ°μΈ μλ³Β·κ°μ λ± λ―Όκ° μμ
|
| 108 |
+
|
| 109 |
+
---
|
| 110 |
+
|
| 111 |
+
## Limitations & Risks
|
| 112 |
+
- **νκ΅ λλ‘ μ μ©**: ν΄μΈΒ·κ·Ήμ μ‘°λΒ·νμ° λ± νκ²½μμ μ±λ₯ μ ν
|
| 113 |
+
- λΆλΆ κ°λ¦Ό μΈμ²΄ κ°μ§ λΆμμ β 보쑰μ©μΌλ‘λ§ μ¬μ©
|
| 114 |
+
- AIHUB μ΄λ
Έν
μ΄μ
νΈν₯ μν₯ κ°λ₯
|
| 115 |
+
|
| 116 |
+
---
|
| 117 |
+
|
| 118 |
+
## Citation
|
| 119 |
+
@misc{KoalaSeg2025,
|
| 120 |
+
title = {KoalaSeg: Layered Distillation for Korean Road Universal Segmentation},
|
| 121 |
+
author = {RoadSight Team},
|
| 122 |
+
year = {2025},
|
| 123 |
+
url = {https://huggingface.co/gj5520/KoalaSeg}
|
| 124 |
+
}
|