Spaces:

tingul4
/

text-image-seg

Running

App Files Files Community

text-image-seg / README.md

tingul4

update README

a560cae about 1 month ago

preview code

raw

history blame contribute delete

2.41 kB

	---
	title: Text Image Seg
	emoji: 📊
	colorFrom: red
	colorTo: green
	sdk: gradio
	sdk_version: 5.49.1
	app_file: app.py
	pinned: false
	---

	# 🎯 Text-Guided Image Segmentation Demo

	基於 Grounding DINO 和 SAM (Segment Anything Model) 的文字引導圖片分割應用，使用 Gradio 構建互動式介面。

	## 🚀 快速開始

	### Local Host
	- ```pip install -r requirements.txt```
	- ```gradio app.py``` or ```python app.py```
	- 應用將在 `http://localhost:7860` 啟動。

	### Hugging Face Spaces
	[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-md-dark.svg)](https://huggingface.co/spaces/tingul4/text-image-seg)


	## 🎮 使用方法

	1. 上傳圖片：點擊或拖拽圖片到輸入區域
	2. 輸入文字提示：
	- 單個物件：`car`、`person`、`sky`
	- 多個物件：`car. sky. road.`（用句號分隔）
	3. 點擊 Segment 按鈕：開始分割
	4. 查看結果：
	- 左側顯示分割遮罩（不同物件用不同顏色）
	- 右側顯示除錯資訊（檢測數量、標籤、信心度）

	## 💡 使用技巧

	### 提高檢測準確度

	1. 使用具體描述：`blue car` 比 `car` 更精確
	2. 調整閾值：當前閾值為 0.15，可在 source code 中調整
	3. 多次嘗試：嘗試不同的文字表達方式或是使用英文描述

	## 📦 依賴項

	- `gradio` - 互動式 Web 介面
	- `transformers` - Hugging Face 模型庫
	- `torch` - PyTorch 深度學習框架
	- `Pillow` - 圖像處理
	- `numpy` - 數值計算

	## 📝 程式碼結構

	```
	text-image-seg/
	├── app.py
	├── requirements.txt
	├── README.md
	├── sample_images/
	├── .gitattributes
	└── .gitignore
	```

	## 🔧 技術架構

	### 模型

	- Grounding DINO (`IDEA-Research/grounding-dino-base`)
	- 用於零樣本物件檢測
	- 根據文字描述定位物件

	- SAM (`facebook/sam-vit-base`)
	- 用於精確分割
	- 基於檢測框生成高質量遮罩

	### 工作流程

	```
	輸入圖片 + 文字提示
	↓
	Grounding DINO 檢測物件
	↓
	SAM 生成分割遮罩
	↓
	多物件遮罩疊加（不同顏色）
	↓
	輸出結果
	```

	## 🔗 相關資源

	- [Grounding DINO](https://github.com/IDEA-Research/GroundingDINO)
	- [Segment Anything Model (SAM)](https://github.com/facebookresearch/segment-anything)
	- [Gradio 文檔](https://www.gradio.app/docs)