gj5520 commited on
Commit
ecf7c22
Β·
verified Β·
1 Parent(s): c68ab56

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -36
README.md CHANGED
@@ -8,71 +8,117 @@ tags:
8
  - oneformer
9
  - distillation
10
  - aihub
11
- license: cc-by-4.0
12
- model_name: KoalaSeg-Edge-ViT
13
  ---
14
 
15
- # KoalaSeg-Edge-ViT πŸ¨πŸ›£οΈ
16
- **KoalaSeg = _KOrean lAyered assistive Segmentation_**
17
 
18
- ν•œκ΅­ λ„λ‘œβ€†Β·β€†λ³΄ν–‰ ν™˜κ²½ μ „μš© **Universal Segmentation** λͺ¨λΈμž…λ‹ˆλ‹€.
19
- 3-쀑 λ ˆμ΄μ–΄ 마슀크(XML 폴리곀 β–Ά AIHUB λ„λ‘œλ³΄ν–‰ λ°μ΄ν„°λ‘œ ν•™μŠ΅ν•œ ν•œκ΅­ λ„λ‘œ μ „μš© λͺ¨λΈ β–Ά OneFormer-Cityscapes)λ₯Ό 겹쳐 λ§Œλ“  톡합 GT둜 νŒŒμΈνŠœλ‹ν•œ OneFormer Edge-ViT 학생 λ²„μ „μž…λ‹ˆλ‹€.
 
 
 
 
 
 
20
 
21
  ---
22
 
23
  ## Model Details
24
- | ν•­λͺ© | λ‚΄μš© |
25
- |------|------|
26
- | **Developed by** | Team RoadSight |
27
- | **Model type** | Edge-ViT backbone + OneFormer head<br>(semantic-only task token) |
28
- | **Finetuned from** | `shi-labs/oneformer_cityscapes_swin_large` |
29
- | **Framework** | πŸ€— Transformers v4.41 / PyTorch 2.3 |
30
- | **License** | CC BY 4.0 |
31
 
32
  ---
33
 
34
  ## Training Data
35
- | 좜처 | μˆ˜λŸ‰ | 주석 방식 |
36
- |------|------|-----------|
37
- | **AIHUB λ„λ‘œΒ·λ³΄ν–‰ν™˜κ²½** <br> (λ„λ‘œ μ°¨μ„ , 인도, νš‘λ‹¨λ³΄λ„) | 5 615 μž₯ | 곡식 pixel-wise GT |
38
- | μžκ°€ 촬영 지방도 | 9 042 μž₯ | CVAT XML 폴리곀 |
39
- | Street View νŒŒμƒ | 3 712 μž₯ | OneFormer-Cityscapes pseudo-mask |
40
- | **총합** | **18 369 μž₯** | 3-쀑 λ ˆμ΄μ–΄ ν•©μ„± β†’ Morph Open/Close + MedianBlur(17 px) |
 
 
 
 
41
 
42
  ---
43
 
44
- ## Speeds & Sizes *(512 Γ— 512 batch 1)*
45
 
46
- | Device | Baseline Cityscapes | Ensemble(3-λ ˆμ΄μ–΄) | Custom(K-Road) | **KoalaSeg(ft)** |
47
- |--------|--------------------|-------------------|---------------|------------------|
48
- | **A100** | 3.58 s β†’ 0.28 FPS | 3.74 s β†’ 0.27 FPS | 0.15 s β†’ 6.67 FPS | **0.14 s β†’ 7.25 FPS** |
49
- | **T4** | 5.61 s β†’ 0.18 FPS | 6.01 s β†’ 0.17 FPS | 0.39 s β†’ 2.60 FPS | **0.31 s β†’ 3.27 FPS** |
50
- | **CPU (i9-12900K)** | 124 s | 150 s | 26.6 s | **18.4 s** |
 
 
51
 
52
  ---
53
 
54
  ## Evaluation (κ΅­λ‚΄ ν…ŒμŠ€νŠΈμ…‹)
55
- | Metric | Baseline | **KoalaSeg** |
56
- |--------|----------|--------------|
57
- | mIoU (전체 클래슀) | 0.55 | **0.81** |
58
- | F1 – λ„λ‘œ vs 인도 | 0.58 | **0.89** |
 
59
 
60
  ---
61
 
62
  ## Quick Start
63
- ```python
64
  from transformers import AutoProcessor, AutoModelForUniversalSegmentation
65
- import torch, numpy as np, matplotlib.pyplot as plt
66
  from PIL import Image
 
67
 
68
- model_id = "roadsight/KoalaSeg-Edge-ViT"
69
- proc = AutoProcessor.from_pretrained(model_id)
70
- model = AutoModelForUniversalSegmentation.from_pretrained(model_id).to("cuda")
71
 
72
- img = Image.open("korean_road.jpg").convert("RGB")
 
 
 
 
 
73
  inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda")
74
  with torch.no_grad():
75
  out = model(**inputs)
76
 
 
77
  idmap = proc.post_process_semantic_segmentation(out, target_sizes=[img.size[::-1]])[0]
78
- plt.imshow(idmap.cpu()); plt.axis("off"); plt.show()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - oneformer
9
  - distillation
10
  - aihub
11
+ model_name: koalaseg
 
12
  ---
13
 
14
+ # koalaseg πŸ¨πŸ›£οΈ
15
+ _KOrean lAyered assistive Segmentation_
16
 
17
+ ![Inference Demo](./overlay_ft_20250621_130747.png)
18
+
19
+ ν•œκ΅­ λ„λ‘œΒ·λ³΄ν–‰ ν™˜κ²½ μ „μš© **Universal Segmentation** λͺ¨λΈμž…λ‹ˆλ‹€.
20
+ `shi-labs/oneformer_cityscapes_swin_large` 기반 OneFormer ꡐ사 λͺ¨λΈμ„
21
+ 1. μˆ˜μž‘μ—… XML 폴리곀
22
+ 2. AIHUB λ„λ‘œΒ·λ³΄ν–‰ν™˜κ²½ Surface Mask(5k) + Polygon(500) λ°μ΄ν„°λ‘œ ν•™μŠ΅ν•œ ν•œκ΅­ν˜• λͺ¨λΈ
23
+ 3. Cityscapes 마슀크
24
+ 순으둜 **λ ˆμ΄μ–΄λ“œ 앙상블**ν•˜μ—¬ μƒμ„±ν•œ GT둜 Edge-ViT 20 M 학생 λͺ¨λΈμ„ **증λ₯˜**ν–ˆμŠ΅λ‹ˆλ‹€.
25
 
26
  ---
27
 
28
  ## Model Details
29
+
30
+ - **Developed by**: Team RoadSight
31
+ - **Base model**: `shi-labs/oneformer_cityscapes_swin_large`
32
+ - **Model type**: Edge-ViT 20 M + OneFormer head (semantic task)
33
+ - **Framework**: πŸ€— Transformers & PyTorch
 
 
34
 
35
  ---
36
 
37
  ## Training Data
38
+
39
+ AIHUB μΈλ„Β·λ³΄ν–‰ν™˜κ²½ 데이터 (https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=189):
40
+
41
+ - **Bounding Box**: 350,000μž₯ (29μ’… μž₯μ• λ¬Ό λ°•μŠ€ μ–΄λ…Έν…Œμ΄μ…˜)
42
+ - **Polygon**: 100,000μž₯ (29μ’… μž₯μ• λ¬Ό 폴리곀 μ–΄λ…Έν…Œμ΄μ…˜) β†’ **500μž₯ μ‚¬μš©**
43
+ - **Surface Masking**: 50,000μž₯ (λ…Έλ©΄ μƒνƒœ 마슀크) β†’ **5,000μž₯ μ‚¬μš©**
44
+ - **Depth Prediction**: 170,000μž₯ (μŠ€ν…Œλ ˆμ˜€ 깊이)
45
+
46
+ 총 **18,369μž₯** (AIHUB 5.5k + μžκ°€ 촬영 9k + Street View 3.7k) λ ˆμ΄μ–΄ 앙상블 β†’
47
+ Morph Open/Close + MedianBlur(17px) ν›„ GT 생성.
48
 
49
  ---
50
 
51
+ ## Speeds & Sizes (512Γ—512, batch=1)
52
 
53
+ | Device | Baseline Cityscapes | Ensemble (3-layer) | Custom (K-Road) | **koalaseg** |
54
+ |-----------------------|---------------------|--------------------|-----------------|--------------------|
55
+ | **A100** | 3.58 s β†’ 0.28 FPS | 3.74 s β†’ 0.27 FPS | 0.15 s β†’ 6.67 FPS | **0.14 s β†’ 7.25 FPS** |
56
+ | **T4** | 5.61 s β†’ 0.18 FPS | 6.01 s β†’ 0.17 FPS | 0.39 s β†’ 2.60 FPS | **0.31 s β†’ 3.27 FPS** |
57
+ | **CPU (i9-12900K)** | 124 s β†’ 0.008 FPS | 150 s β†’ 0.007 FPS | 26.6 s β†’ 0.038 FPS | **18.4 s β†’ 0.054 FPS** |
58
+
59
+ λͺ¨λΈ 크기: 83 MB (INT8 μ–‘μžν™”)
60
 
61
  ---
62
 
63
  ## Evaluation (κ΅­λ‚΄ ν…ŒμŠ€νŠΈμ…‹)
64
+
65
+ | Metric | Baseline | **koalaseg** |
66
+ |-----------------------|----------|--------------|
67
+ | mIoU (전체 클래슀) | 0.55 | **0.81** |
68
+ | F1 – λ„λ‘œ vs 인도 | 0.58 | **0.89** |
69
 
70
  ---
71
 
72
  ## Quick Start
 
73
  from transformers import AutoProcessor, AutoModelForUniversalSegmentation
74
+ import torch, requests, matplotlib.pyplot as plt
75
  from PIL import Image
76
+ from io import BytesIO
77
 
78
+ model_id = "gj5520/KoalaSeg"
79
+ proc = AutoProcessor.from_pretrained(model_id)
80
+ model = AutoModelForUniversalSegmentation.from_pretrained(model_id).to("cuda")
81
 
82
+ # 1. 이미지 λ‹€μš΄λ‘œλ“œ
83
+ url = "https://pds.joongang.co.kr/news/component/htmlphoto_mmdata/202205/21/1200738c-61c0-4a51-83c4-331f53d4dcdc.jpg"
84
+ resp = requests.get(url, stream=True)
85
+ img = Image.open(BytesIO(resp.content)).convert("RGB")
86
+
87
+ # 2. μ „μ²˜λ¦¬ + λͺ¨λΈ μΆ”λ‘ 
88
  inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda")
89
  with torch.no_grad():
90
  out = model(**inputs)
91
 
92
+ # 3. ν›„μ²˜λ¦¬ 및 μ‹œκ°ν™”
93
  idmap = proc.post_process_semantic_segmentation(out, target_sizes=[img.size[::-1]])[0]
94
+ plt.figure(figsize=(8, 8))
95
+ plt.imshow(idmap.cpu(), cmap="tab20")
96
+ plt.axis("off")
97
+ plt.show()
98
+
99
+
100
+ ## Intended Uses
101
+ - μ‹œκ° μž₯애인 λŒ€μƒ **μ‹€μ‹œκ°„ λ„λ‘œ μ„Έκ·Έλ©˜ν…Œμ΄μ…˜**
102
+ - ν•œκ΅­ HD λ§΅Β·λ„λ‘œ μœ μ§€λ³΄μˆ˜ 지원
103
+ - ν•™μˆ Β·μ—°κ΅¬ λͺ©μ μ˜ ν•œκ΅­ν˜• 데이터셋 벀치마크
104
+
105
+ ### Out-of-Scope
106
+ - μ˜λ£ŒΒ·μœ„μ„±Β·μ‹€λ‚΄ λ“± λΉ„λ„λ‘œ 도메인
107
+ - 개인 μ‹λ³„Β·κ°μ‹œ λ“± 민감 μž‘μ—…
108
+
109
+ ---
110
+
111
+ ## Limitations & Risks
112
+ - **ν•œκ΅­ λ„λ‘œ μ „μš©**: ν•΄μ™ΈΒ·κ·Ήμ €μ‘°λ„Β·ν­μš° λ“± ν™˜κ²½μ—μ„œ μ„±λŠ₯ μ €ν•˜
113
+ - λΆ€λΆ„ κ°€λ¦Ό 인체 감지 λΆˆμ•ˆμ • β†’ 보쑰용으둜만 μ‚¬μš©
114
+ - AIHUB μ–΄λ…Έν…Œμ΄μ…˜ 편ν–₯ 영ν–₯ κ°€λŠ₯
115
+
116
+ ---
117
+
118
+ ## Citation
119
+ @misc{KoalaSeg2025,
120
+ title = {KoalaSeg: Layered Distillation for Korean Road Universal Segmentation},
121
+ author = {RoadSight Team},
122
+ year = {2025},
123
+ url = {https://huggingface.co/gj5520/KoalaSeg}
124
+ }