You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

LandViT-DPT-330m

Land Cover Vision Transformer for Semantic Segmentation

이 모델은 항공 영상에서 토지 피복을 11개 클래스로 분류하는 Semantic Segmentation 모델입니다.

모델 정보

모델 구조: Vision Transformer (ViT) + DPT-style Decoder
입력 해상도: 512×512×3
클래스 수: 11 (배경 포함)
프레임워크: PyTorch + Hugging Face Transformers

성능 지표

지표	값
mIoU	(추가 예정)

클래스 정보

ID	클래스	코드	색상
0	배경	100, 255	`#000000`
1	건물	10	`#B883ED`
2	주차장	20	`#1040B2`
3	도로	30	`#2A41F7`
4	가로수	40	`#C8E59B`
5	논	50	`#BFFFFF`
6	비닐하우스	55	`#DCF0FF`
7	밭	60	`#66F9F7`
8	산림	70	`#2D4B2A`
9	나지	80	`#FFF29F`
10	농경지	90	`#D2B48C`

사용 방법

설치

pip install torch torchvision transformers pillow numpy

기본 추론

import torch
from PIL import Image
from transformers import AutoModelForSemanticSegmentation, AutoImageProcessor

# 모델 로드
model = AutoModelForSemanticSegmentation.from_pretrained(
    "JDONE-Research/LandViT-DPT-330m",
    trust_remote_code=True
)
processor = AutoImageProcessor.from_pretrained(
    "JDONE-Research/LandViT-DPT-330m",
    trust_remote_code=True
)

# 이미지 로드
image = Image.open("image.jpg")

# 전처리
inputs = processor(images=image, return_tensors="pt")

# 추론
with torch.no_grad():
    outputs = model(**inputs)

# 후처리
segmentation_map = processor.post_process_semantic_segmentation(
    outputs, target_sizes=[(image.height, image.width)]
)[0]

print(f"Segmentation shape: {segmentation_map.shape}")

시각화

import numpy as np
import matplotlib.pyplot as plt

# 모델 설정에서 팔레트 가져오기
palette = model.config.label_colors

# 컬러 마스크 생성
color_mask = np.zeros((*segmentation_map.shape, 3), dtype=np.uint8)
for class_id, color in enumerate(palette):
    color_mask[segmentation_map == class_id] = color

# 시각화
fig, axes = plt.subplots(1, 2, figsize=(15, 5))
axes[0].imshow(image)
axes[0].set_title("Original Image")
axes[0].axis("off")

axes[1].imshow(color_mask)
axes[1].set_title("Segmentation Result")
axes[1].axis("off")

plt.tight_layout()
plt.show()

배치 추론

# 여러 이미지 처리
images = [Image.open(f"image_{i}.jpg") for i in range(4)]

inputs = processor(images=images, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

segmentation_maps = processor.post_process_semantic_segmentation(
    outputs, target_sizes=[(img.height, img.width) for img in images]
)

for i, seg_map in enumerate(segmentation_maps):
    print(f"Image {i} segmentation shape: {seg_map.shape}")

제한 사항

항공 영상에 최적화되어 있으며, 다른 도메인의 영상에서는 성능이 저하될 수 있습니다.
512×512 해상도로 학습되어, 이보다 큰 이미지는 패치 단위로 처리해야 합니다.
한국 지역 데이터로 학습되어, 다른 지역에서는 fine-tuning이 필요할 수 있습니다.
일부 클래스(주차장, 비닐하우스 등)는 데이터 불균형으로 인해 상대적으로 낮은 정확도를 보일 수 있습니다.

Citation

@misc{landvit2026,
  title={LandViT: Land Cover Vision Transformer for Semantic Segmentation},
  author={JDONE Inc.},
  year={2026},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/JDONE-Research/LandViT-DPT-330m}}
}

Downloads last month: 2

Safetensors

Model size

0.3B params

Tensor type

F32