--- library_name: transformers tags: - vision - image-segmentation - universal-segmentation - korean-road - oneformer - distillation - aihub model_name: koalaseg --- # KoalaSeg πŸ¨πŸ›£οΈ ## Colab Inference : [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1LXWqtv-7lba128iEzhgSwXtEzRpRF7I0?usp=sharing) _KOrean lAyered assistive Segmentation_ ![Inference Demo](./overlay_ft_20250621_130747.png) ν•œκ΅­ λ„λ‘œΒ·λ³΄ν–‰ ν™˜κ²½ μ „μš© **Universal Segmentation** λͺ¨λΈμž…λ‹ˆλ‹€. `shi-labs/oneformer_cityscapes_swin_large` 기반 OneFormer ꡐ사 λͺ¨λΈμ„ 1. μˆ˜μž‘μ—… XML 폴리곀 2. AIHUB λ„λ‘œΒ·λ³΄ν–‰ν™˜κ²½ Surface Mask(5k) + Polygon(500) λ°μ΄ν„°λ‘œ ν•™μŠ΅ν•œ ν•œκ΅­ν˜• λͺ¨λΈ 3. Cityscapes 마슀크 순으둜 **λ ˆμ΄μ–΄λ“œ 앙상블**ν•˜μ—¬ μƒμ„±ν•œ GT둜 Edge-ViT 20 M 학생 λͺ¨λΈμ„ **증λ₯˜**ν–ˆμŠ΅λ‹ˆλ‹€. --- ## Model Details - **Developed by**: Team RoadSight - **Base model**: `shi-labs/oneformer_cityscapes_swin_large` - **Model type**: Edge-ViT 20 M + OneFormer head (semantic task) - **Framework**: πŸ€— Transformers & PyTorch --- ## Training Data AIHUB μΈλ„Β·λ³΄ν–‰ν™˜κ²½ 데이터 (https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=189): - **Bounding Box**: 350,000μž₯ (29μ’… μž₯μ• λ¬Ό λ°•μŠ€ μ–΄λ…Έν…Œμ΄μ…˜) - **Polygon**: 100,000μž₯ (29μ’… μž₯μ• λ¬Ό 폴리곀 μ–΄λ…Έν…Œμ΄μ…˜) β†’ **500μž₯ μ‚¬μš©** - **Surface Masking**: 50,000μž₯ (λ…Έλ©΄ μƒνƒœ 마슀크) β†’ **5,000μž₯ μ‚¬μš©** - **Depth Prediction**: 170,000μž₯ (μŠ€ν…Œλ ˆμ˜€ 깊이) 총 **18,369μž₯** (AIHUB 5.5k + μžκ°€ 촬영 9k + Street View 3.7k) λ ˆμ΄μ–΄ 앙상블 β†’ Morph Open/Close + MedianBlur(17px) ν›„ GT 생성. --- ## Speeds & Sizes (512Γ—512, batch=1) | Device | Baseline Cityscapes | Ensemble (3-layer) | Custom (K-Road) | **koalaseg** | |-----------------------|---------------------|--------------------|-----------------|--------------------| | **A100** | 3.58 s β†’ 0.28 FPS | 3.74 s β†’ 0.27 FPS | 0.15 s β†’ 6.67 FPS | **0.14 s β†’ 7.25 FPS** | | **T4** | 5.61 s β†’ 0.18 FPS | 6.01 s β†’ 0.17 FPS | 0.39 s β†’ 2.60 FPS | **0.31 s β†’ 3.27 FPS** | | **CPU (i9-12900K)** | 124 s β†’ 0.008 FPS | 150 s β†’ 0.007 FPS | 26.6 s β†’ 0.038 FPS | **18.4 s β†’ 0.054 FPS** | --- ## Quick Start ```python from transformers import AutoProcessor, AutoModelForUniversalSegmentation import torch, requests, matplotlib.pyplot as plt, numpy as np from PIL import Image from io import BytesIO # 0. Load model & processor ----------------------------------- model_id = "gj5520/KoalaSeg" proc = AutoProcessor.from_pretrained(model_id) model = AutoModelForUniversalSegmentation.from_pretrained(model_id).to("cuda").eval() # 1. Download image ------------------------------------------- url = "https://pds.joongang.co.kr/news/component/htmlphoto_mmdata/202205/21/1200738c-61c0-4a51-83c4-331f53d4dcdc.jpg" resp = requests.get(url, stream=True) img = Image.open(BytesIO(resp.content)).convert("RGB") # 2. Pre-process & inference ---------------------------------- inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda") with torch.no_grad(): out = model(**inputs) # 3-A. Get class-id map --------------------------------------- idmap = proc.post_process_semantic_segmentation( out, target_sizes=[img.size[::-1]] )[0].cpu().numpy() # 3-B. Convert idmap β†’ RGB mask + overlay --------------------- cmap = plt.get_cmap("tab20", max(20, len(np.unique(idmap)))) mask_rgb = np.zeros((*idmap.shape, 3), dtype=np.uint8) for idx, cid in enumerate(np.unique(idmap)): if cid == 0: # keep background black continue mask_rgb[idmap == cid] = (np.array(cmap(idx)[:3]) * 255).astype(np.uint8) mask_img = Image.fromarray(mask_rgb) overlay = Image.blend(img, mask_img, alpha=0.6) # 0.6 β†’ mask κ°•μ‘° # 4. Show overlay --------------------------------------------- plt.figure(figsize=(8, 8)) plt.imshow(overlay) plt.axis("off") plt.show() ``` ## Intended Uses - μ‹œκ° μž₯애인 λŒ€μƒ **λ„λ‘œ μ„Έκ·Έλ©˜ν…Œμ΄μ…˜** - ν•œκ΅­ HD λ§΅Β·λ„λ‘œ μœ μ§€λ³΄μˆ˜ 지원 - ν•™μˆ Β·μ—°κ΅¬ λͺ©μ μ˜ ν•œκ΅­ν˜• 데이터셋 벀치마크 ### Out-of-Scope - μ˜λ£ŒΒ·μœ„μ„±Β·μ‹€λ‚΄ λ“± λΉ„λ„λ‘œ 도메인 - 개인 μ‹λ³„Β·κ°μ‹œ λ“± 민감 μž‘μ—… --- ## Limitations & Risks - **ν•œκ΅­ λ„λ‘œ μ „μš©**: ν•΄μ™ΈΒ·κ·Ήμ €μ‘°λ„Β·ν­μš° λ“± ν™˜κ²½μ—μ„œ μ„±λŠ₯ μ €ν•˜ - λΆ€λΆ„ κ°€λ¦Ό 인체 감지 λΆˆμ•ˆμ • β†’ 보쑰용으둜만 μ‚¬μš© --- ## Citation @misc{KoalaSeg2025, title = {KoalaSeg: Layered Distillation for Korean Road Universal Segmentation}, author = {RoadSight Team}, year = {2025}, url = {https://huggingface.co/gj5520/KoalaSeg} }