gj5520
/

KoalaSeg

@@ -8,71 +8,117 @@ tags:
   - oneformer
   - distillation
   - aihub
-license: cc-by-4.0
-model_name: KoalaSeg-Edge-ViT
 ---
-# KoalaSeg-Edge-ViT 🐨🛣️
-**KoalaSeg = _KOrean lAyered assistive Segmentation_**
-한국 도로 · 보행 환경 전용 **Universal Segmentation** 모델입니다.
-3-중 레이어 마스크(XML 폴리곤 ▶ AIHUB 도로보행 데이터로 학습한 한국 도로 전용 모델 ▶ OneFormer-Cityscapes)를 겹쳐 만든 통합 GT로 파인튜닝한 OneFormer Edge-ViT 학생 버전입니다.
 ---
 ## Model Details
-| 항목 | 내용 |
-|------|------|
-| **Developed by** | Team RoadSight |
-| **Model type** | Edge-ViT backbone + OneFormer head<br>(semantic-only task token) |
-| **Finetuned from** | `shi-labs/oneformer_cityscapes_swin_large` |
-| **Framework** | 🤗 Transformers v4.41 / PyTorch 2.3 |
-| **License** | CC BY 4.0 |
 ---
 ## Training Data
-| 출처 | 수량 | 주석 방식 |
-|------|------|-----------|
-| **AIHUB 도로·보행환경** <br> (도로 차선, 인도, 횡단보도) | 5 615 장 | 공식 pixel-wise GT |
-| 자가 촬영 지방도 | 9 042 장 | CVAT XML 폴리곤 |
-| Street View 파생 | 3 712 장 | OneFormer-Cityscapes pseudo-mask |
-| **총합** | **18 369 장** | 3-중 레이어 합성 → Morph Open/Close + MedianBlur(17 px) |
 ---
-## Speeds & Sizes   *(512 × 512 batch 1)*
-| Device | Baseline Cityscapes | Ensemble(3-레이어) | Custom(K-Road) | **KoalaSeg(ft)** |
-|--------|--------------------|-------------------|---------------|------------------|
-| **A100**   | 3.58 s → 0.28 FPS | 3.74 s → 0.27 FPS | 0.15 s → 6.67 FPS | **0.14 s → 7.25 FPS** |
-| **T4**     | 5.61 s → 0.18 FPS | 6.01 s → 0.17 FPS | 0.39 s → 2.60 FPS | **0.31 s → 3.27 FPS** |
-| **CPU (i9-12900K)** | 124 s | 150 s | 26.6 s | **18.4 s** |
 ---
 ## Evaluation (국내 테스트셋)
-| Metric | Baseline | **KoalaSeg** |
-|--------|----------|--------------|
-| mIoU (전체 클래스) | 0.55 | **0.81** |
-| F1 – 도로 vs 인도 | 0.58 | **0.89** |
 ---
 ## Quick Start
-```python
 from transformers import AutoProcessor, AutoModelForUniversalSegmentation
-import torch, numpy as np, matplotlib.pyplot as plt
 from PIL import Image
-model_id = "roadsight/KoalaSeg-Edge-ViT"
-proc  = AutoProcessor.from_pretrained(model_id)
-model = AutoModelForUniversalSegmentation.from_pretrained(model_id).to("cuda")
-img   = Image.open("korean_road.jpg").convert("RGB")
 inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda")
 with torch.no_grad():
     out = model(**inputs)
 idmap = proc.post_process_semantic_segmentation(out, target_sizes=[img.size[::-1]])[0]
-plt.imshow(idmap.cpu()); plt.axis("off"); plt.show()

   - oneformer
   - distillation
   - aihub
+model_name: koalaseg
 ---
+# koalaseg 🐨🛣️
+_KOrean lAyered assistive Segmentation_
+![Inference Demo](./overlay_ft_20250621_130747.png)
+한국 도로·보행 환경 전용 **Universal Segmentation** 모델입니다.
+`shi-labs/oneformer_cityscapes_swin_large` 기반 OneFormer 교사 모델을
+1. 수작업 XML 폴리곤
+2. AIHUB 도로·보행환경 Surface Mask(5k) + Polygon(500) 데이터로 학습한 한국형 모델
+3. Cityscapes 마스크
+순으로 **레이어드 앙상블**하여 생성한 GT로 Edge-ViT 20 M 학생 모델을 **증류**했습니다.
 ---
 ## Model Details
+- **Developed by**: Team RoadSight
+- **Base model**: `shi-labs/oneformer_cityscapes_swin_large`
+- **Model type**: Edge-ViT 20 M + OneFormer head (semantic task)
+- **Framework**: 🤗 Transformers & PyTorch
 ---
 ## Training Data
+AIHUB 인도·보행환경 데이터 (https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=189):
+- **Bounding Box**: 350,000장 (29종 장애물 박스 어노테이션)
+- **Polygon**: 100,000장 (29종 장애물 폴리곤 어노테이션) → **500장 사용**
+- **Surface Masking**: 50,000장 (노면 상태 마스크) → **5,000장 사용**
+- **Depth Prediction**: 170,000장 (스테레오 깊이)
+총 **18,369장** (AIHUB 5.5k + 자가 촬영 9k + Street View 3.7k) 레이어 앙상블 →
+Morph Open/Close + MedianBlur(17px) 후 GT 생성.
 ---
+## Speeds & Sizes (512×512, batch=1)
+| Device                | Baseline Cityscapes | Ensemble (3-layer) | Custom (K-Road) | **koalaseg**       |
+|-----------------------|---------------------|--------------------|-----------------|--------------------|
+| **A100**              | 3.58 s → 0.28 FPS   | 3.74 s → 0.27 FPS  | 0.15 s → 6.67 FPS | **0.14 s → 7.25 FPS** |
+| **T4**                | 5.61 s → 0.18 FPS   | 6.01 s → 0.17 FPS  | 0.39 s → 2.60 FPS | **0.31 s → 3.27 FPS** |
+| **CPU (i9-12900K)**   | 124 s → 0.008 FPS   | 150 s → 0.007 FPS  | 26.6 s → 0.038 FPS | **18.4 s → 0.054 FPS** |
+모델 크기: 83 MB (INT8 양자화)
 ---
 ## Evaluation (국내 테스트셋)
+| Metric                | Baseline | **koalaseg** |
+|-----------------------|----------|--------------|
+| mIoU (전체 클래스)    | 0.55     | **0.81**     |
+| F1 – 도로 vs 인도     | 0.58     | **0.89**     |
 ---
 ## Quick Start
 from transformers import AutoProcessor, AutoModelForUniversalSegmentation
+import torch, requests, matplotlib.pyplot as plt
 from PIL import Image
+from io import BytesIO
+model_id = "gj5520/KoalaSeg"
+proc     = AutoProcessor.from_pretrained(model_id)
+model    = AutoModelForUniversalSegmentation.from_pretrained(model_id).to("cuda")
+# 1. 이미지 다운로드
+url = "https://pds.joongang.co.kr/news/component/htmlphoto_mmdata/202205/21/1200738c-61c0-4a51-83c4-331f53d4dcdc.jpg"
+resp = requests.get(url, stream=True)
+img  = Image.open(BytesIO(resp.content)).convert("RGB")
+# 2. 전처리 + 모델 추론
 inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda")
 with torch.no_grad():
     out = model(**inputs)
+# 3. 후처리 및 시각화
 idmap = proc.post_process_semantic_segmentation(out, target_sizes=[img.size[::-1]])[0]
+plt.figure(figsize=(8, 8))
+plt.imshow(idmap.cpu(), cmap="tab20")
+plt.axis("off")
+plt.show()
+## Intended Uses
+- 시각 장애인 대상 **실시간 도로 세그멘테이션**
+- 한국 HD 맵·도로 유지보수 지원
+- 학술·연구 목적의 한국형 데이터셋 벤치마크
+### Out-of-Scope
+- 의료·위성·실내 등 비도로 도메인
+- 개인 식별·감시 등 민감 작업
+---
+## Limitations & Risks
+- **한국 도로 전용**: 해외·극저조도·폭우 등 환경에서 성능 저하
+- 부분 가림 인체 감지 불안정 → 보조용으로만 사용
+- AIHUB 어노테이션 편향 영향 가능
+---
+## Citation
+@misc{KoalaSeg2025,
+  title = {KoalaSeg: Layered Distillation for Korean Road Universal Segmentation},
+  author = {RoadSight Team},
+  year = {2025},
+  url   = {https://huggingface.co/gj5520/KoalaSeg}
+}