Thibaut commited on
Commit
39593e0
Β·
1 Parent(s): b2e88b8

Update README with proper YAML metadata

Browse files
Files changed (1) hide show
  1. README.md +36 -348
README.md CHANGED
@@ -1,378 +1,66 @@
1
- # SAM3 Static Image Segmentation - HuggingFace Deployment
2
-
3
- Production-ready deployment of Meta's SAM3 (Segment Anything Model 3) for text-prompted static image segmentation on HuggingFace Inference Endpoints with Azure Container Registry.
4
-
5
- ## πŸš€ Quick Start
 
 
 
 
 
 
6
 
7
- ### Deployments
8
 
9
- This repository supports deployment to **both HuggingFace and Azure AI Foundry**. See [DEPLOYMENT.md](DEPLOYMENT.md) for dual-deployment guide.
10
 
11
- #### HuggingFace (Current)
12
 
13
- **URL**: `https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud`
14
- **Status**: βœ… Running
15
- **Model**: `facebook/sam3` (Sam3Model for static images)
16
- **Hardware**: NVIDIA A10G GPU (24GB VRAM)
 
17
 
18
- #### Azure AI Foundry (Pending GPU Quota)
19
 
20
- **Registry**: `sam3acr.azurecr.io`
21
- **Status**: ⏳ Waiting for GPU quota approval
22
- **See**: [DEPLOYMENT.md](DEPLOYMENT.md) for deployment instructions
23
 
24
- ### Basic Usage
25
 
26
  ```python
27
  import requests
28
  import base64
29
- from PIL import Image
30
- import io
31
 
32
- # Load and encode image
33
  with open("image.jpg", "rb") as f:
34
  image_b64 = base64.b64encode(f.read()).decode()
35
 
36
- # Request segmentation masks
37
- response = requests.post(
38
- "https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud",
39
- json={
40
- "inputs": image_b64,
41
- "parameters": {
42
- "classes": ["pothole", "asphalt", "yellow line", "shadow"]
43
- }
44
- }
45
- )
46
-
47
- # Process results
48
- results = response.json()
49
- for result in results:
50
- label = result["label"]
51
- score = result["score"]
52
- mask_b64 = result["mask"]
53
-
54
- # Decode mask (PNG image as base64)
55
- mask_bytes = base64.b64decode(mask_b64)
56
- mask_image = Image.open(io.BytesIO(mask_bytes))
57
-
58
- print(f"Class: {label}, Score: {score}")
59
- mask_image.save(f"mask_{label}.png")
60
- ```
61
-
62
- ## πŸ“‹ API Reference
63
-
64
- ### POST `/`
65
-
66
- Segment objects in an image using text prompts.
67
-
68
- **Request Body**:
69
- ```json
70
- {
71
- "inputs": "<base64 encoded JPEG/PNG image>",
72
- "parameters": {
73
- "classes": ["object1", "object2", "object3"]
74
- }
75
- }
76
- ```
77
-
78
- **Response**:
79
- ```json
80
- [
81
- {
82
- "label": "object1",
83
- "score": 1.0,
84
- "mask": "<base64 encoded PNG mask>"
85
- },
86
- {
87
- "label": "object2",
88
- "score": 1.0,
89
- "mask": "<base64 encoded PNG mask>"
90
- }
91
- ]
92
- ```
93
-
94
- **Mask Format**:
95
- - PNG grayscale image (base64 encoded)
96
- - White pixels (255) = object present
97
- - Black pixels (0) = background
98
- - Same dimensions as input image
99
-
100
- ### GET `/health`
101
-
102
- Check endpoint health and GPU status.
103
-
104
- **Response**:
105
- ```json
106
- {
107
- "status": "healthy",
108
- "model": "Sam3Model",
109
- "gpu_available": true,
110
- "vram": {
111
- "total_gb": 23.95,
112
- "allocated_gb": 1.72,
113
- "free_gb": 22.20,
114
- "processing_now": 0
115
- }
116
- }
117
- ```
118
-
119
- ### GET `/metrics`
120
-
121
- Get VRAM metrics.
122
-
123
- **Response**:
124
- ```json
125
- {
126
- "total_gb": 23.95,
127
- "allocated_gb": 1.72,
128
- "free_gb": 22.20,
129
- "processing_now": 0
130
- }
131
- ```
132
-
133
- ## πŸ› οΈ Deployment Architecture
134
-
135
- ### Components
136
-
137
- - **Model**: `facebook/sam3` (Sam3Model - 3.4GB)
138
- - **Container**: NVIDIA CUDA 12.9.1 + Ubuntu 24.04
139
- - **Registry**: Azure Container Registry `sam3acr4hf.azurecr.io`
140
- - **Endpoint**: HuggingFace Inference Endpoints (Logiroad organization)
141
- - **GPU**: NVIDIA A10G (24GB VRAM)
142
-
143
- ### Repository Structure
144
-
145
- ```
146
- sam3_huggingface/
147
- β”œβ”€β”€ src/ # Source code
148
- β”‚ β”œβ”€β”€ app.py # FastAPI inference server
149
- β”‚ └── utils/ # Utility modules
150
- β”œβ”€β”€ docker/ # Docker configurations
151
- β”‚ β”œβ”€β”€ Dockerfile # Container definition
152
- β”‚ └── requirements.txt # Python dependencies
153
- β”œβ”€β”€ deployments/ # Platform-specific deployments
154
- β”‚ β”œβ”€β”€ huggingface/ # HuggingFace configuration
155
- β”‚ └── azure/ # Azure AI Foundry configuration
156
- β”œβ”€β”€ scripts/ # Automation scripts
157
- β”‚ β”œβ”€β”€ deploy_all.sh # Unified deployment
158
- β”‚ └── test/ # Test scripts
159
- β”œβ”€β”€ docs/ # Documentation
160
- β”‚ └── DEPLOYMENT.md # Deployment guide
161
- β”œβ”€β”€ assets/ # Static assets
162
- β”‚ β”œβ”€β”€ test_images/ # Test images
163
- β”‚ └── examples/ # Usage examples
164
- β”œβ”€β”€ model/ # SAM3 model files (3.4GB)
165
- └── README.md # This file
166
- ```
167
-
168
- ## πŸ”§ Local Development
169
-
170
- ### Prerequisites
171
-
172
- - Docker with NVIDIA GPU support
173
- - Azure CLI (for ACR access)
174
- - Python 3.11+
175
- - CUDA-compatible GPU (optional, for local testing)
176
-
177
- ### Build Docker Image
178
-
179
- ```bash
180
- docker build -t sam3acr4hf.azurecr.io/sam3-hf:latest -f docker/Dockerfile .
181
- ```
182
-
183
- ### Run Locally (with GPU)
184
-
185
- ```bash
186
- docker run -p 7860:7860 --gpus all \
187
- sam3acr4hf.azurecr.io/sam3-hf:latest
188
- ```
189
-
190
- ### Test Locally
191
-
192
- ```bash
193
- # Using test script
194
- python3 scripts/test/test_api.py
195
-
196
- # Or using example
197
- python3 assets/examples/usage_example.py
198
- ```
199
-
200
- ## 🚒 Deployment
201
-
202
- ### Quick Deploy (Recommended)
203
-
204
- Use the provided deployment script for easy deployment to one or both platforms:
205
-
206
- ```bash
207
- # Deploy to HuggingFace only (default)
208
- ./deploy_all.sh --hf
209
-
210
- # Deploy to Azure AI Foundry only
211
- ./deploy_all.sh --azure
212
-
213
- # Deploy to both platforms
214
- ./deploy_all.sh --all
215
- ```
216
-
217
- The script handles building, tagging, and pushing to both registries automatically.
218
-
219
- ### Manual Deployment
220
-
221
- #### HuggingFace
222
-
223
- ```bash
224
- ./deployments/huggingface/deploy.sh
225
- ```
226
-
227
- See [`deployments/huggingface/README.md`](deployments/huggingface/README.md) for details.
228
-
229
- #### Azure AI Foundry
230
-
231
- ```bash
232
- ./deployments/azure/deploy.sh
233
- ```
234
-
235
- See [`deployments/azure/README.md`](deployments/azure/README.md) for details.
236
-
237
- For complete deployment instructions, see [`docs/DEPLOYMENT.md`](docs/DEPLOYMENT.md).
238
-
239
- ## πŸ“Š Performance
240
-
241
- - **Inference Time**: ~2-3 seconds for 4 classes
242
- - **Throughput**: Limited by GPU (24GB VRAM)
243
- - **Concurrency**: 2 concurrent requests (configurable)
244
- - **Image Size**: Supports up to ~2000x2000 pixels
245
-
246
- ## πŸ” Key Implementation Details
247
-
248
- ### SAM3 Model Selection
249
-
250
- ⚠️ **Important**: Use `Sam3Model` (static images), not `Sam3VideoModel` (video tracking).
251
-
252
- ```python
253
- from transformers import Sam3Model, Sam3Processor
254
-
255
- # βœ… Correct for static images
256
- model = Sam3Model.from_pretrained("facebook/sam3")
257
- processor = Sam3Processor.from_pretrained("facebook/sam3")
258
-
259
- # ❌ Wrong - for video tracking
260
- # model = Sam3VideoModel.from_pretrained("facebook/sam3")
261
- ```
262
-
263
- ### Batch Processing
264
-
265
- To segment multiple objects in ONE image, repeat the image for each text prompt:
266
-
267
- ```python
268
- # For multiple classes in one image
269
- images_batch = [image] * len(classes) # Repeat image
270
- inputs = processor(
271
- images=images_batch,
272
- text=classes,
273
- return_tensors="pt"
274
- )
275
- ```
276
-
277
- ### Dtype Handling
278
-
279
- Only convert floating-point tensors to match model dtype (float16):
280
-
281
- ```python
282
- model_dtype = next(model.parameters()).dtype
283
- inputs = {
284
- k: v.cuda().to(model_dtype) if v.dtype.is_floating_point
285
- else v.cuda()
286
- for k, v in inputs.items()
287
- if isinstance(v, torch.Tensor)
288
- }
289
- ```
290
-
291
- ## πŸ“¦ Dependencies
292
-
293
- ```txt
294
- fastapi==0.121.3
295
- uvicorn==0.38.0
296
- torch==2.9.1
297
- torchvision
298
- git+https://github.com/huggingface/transformers.git # SAM3 support
299
- huggingface_hub>=1.0.0,<2.0
300
- numpy>=2.3.0
301
- pillow>=12.0.0
302
- ```
303
-
304
- ## πŸ› Troubleshooting
305
-
306
- ### Endpoint Stuck Initializing
307
-
308
- The 15.7GB Docker image takes 5-10 minutes to pull and initialize. Wait patiently.
309
-
310
- ### "shape is invalid for input" Error
311
-
312
- Ensure you're repeating the image for each class:
313
- ```python
314
- images_batch = [image] * len(classes)
315
- ```
316
-
317
- ### "dtype mismatch" Error
318
-
319
- Don't convert integer tensors (input_ids, attention_mask) to float16.
320
-
321
- ### Empty/Wrong Masks
322
-
323
- Ensure text prompts match actual image content. SAM3 will try to find matches even for non-existent objects.
324
-
325
- ## πŸ“ Example: Road Defect Detection
326
-
327
- ```python
328
- import requests
329
- import base64
330
- from PIL import Image
331
- import io
332
-
333
- # Load road image
334
- with open("road.jpg", "rb") as f:
335
- image_b64 = base64.b64encode(f.read()).decode()
336
-
337
- # Segment road defects
338
  response = requests.post(
339
  "https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud",
340
  json={
341
  "inputs": image_b64,
342
- "parameters": {
343
- "classes": ["pothole", "crack", "debris", "patch"]
344
- }
345
  }
346
  )
347
 
348
- # Save masks
349
- results = response.json()
350
- for result in results:
351
- mask_bytes = base64.b64decode(result["mask"])
352
- mask_img = Image.open(io.BytesIO(mask_bytes))
353
- mask_img.save(f"defect_{result['label']}.png")
354
- print(f"Found {result['label']} (score: {result['score']:.2f})")
355
  ```
356
 
357
- ## πŸ“š Resources
358
 
359
- - **Model**: [facebook/sam3 on HuggingFace](https://huggingface.co/facebook/sam3)
360
- - **Paper**: [SAM 3: Segment Anything with Concepts](https://ai.meta.com/research/publications/sam-3/)
361
- - **Endpoint Management**: [HuggingFace Console](https://ui.endpoints.huggingface.co/Logiroad/endpoints/sam3-segmentation)
362
 
363
  ## πŸ“„ License
364
 
365
- This deployment uses Meta's SAM3 model. See the [model card](https://huggingface.co/facebook/sam3) for license information.
366
-
367
- ## 🀝 Support
368
 
369
- For issues with:
370
- - **Model/Inference**: Check SAM3 documentation
371
- - **Deployment**: Contact HuggingFace support
372
- - **Azure Registry**: Check ACR credentials and permissions
373
 
374
- ---
375
-
376
- **Last Updated**: 2025-11-22
377
- **Status**: βœ… Production Ready
378
- **Endpoint**: https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud
 
1
+ ---
2
+ tags:
3
+ - image-segmentation
4
+ - sam
5
+ - custom-docker
6
+ license: mit
7
+ task_categories:
8
+ - image-segmentation
9
+ library_name: transformers
10
+ pipeline_tag: image-segmentation
11
+ ---
12
 
13
+ # SAM3 - Semantic Segmentation Model
14
 
15
+ SAM3 is a semantic segmentation model deployed as a custom Docker container on HuggingFace Inference Endpoints.
16
 
17
+ ## πŸš€ Deployment
18
 
19
+ - **GitHub Repository**: https://github.com/logiroad/sam3
20
+ - **Inference Endpoint**: https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud
21
+ - **Docker Registry**: sam3acr4hf.azurecr.io/sam3-hf:latest
22
+ - **Model**: facebook/sam3 (Sam3Model for static images)
23
+ - **Hardware**: NVIDIA A10G (24GB VRAM)
24
 
25
+ ## πŸ“Š Model Architecture
26
 
27
+ Built on Meta's SAM3 (Segment Anything Model 3) architecture for text-prompted semantic segmentation of static images.
 
 
28
 
29
+ ## 🎯 Usage
30
 
31
  ```python
32
  import requests
33
  import base64
 
 
34
 
35
+ # Read image
36
  with open("image.jpg", "rb") as f:
37
  image_b64 = base64.b64encode(f.read()).decode()
38
 
39
+ # Call endpoint
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  response = requests.post(
41
  "https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud",
42
  json={
43
  "inputs": image_b64,
44
+ "parameters": {"classes": ["pothole", "asphalt"]}
 
 
45
  }
46
  )
47
 
48
+ # Get results
49
+ masks = response.json()
50
+ for result in masks:
51
+ print(f"Class: {result['label']}, Score: {result['score']}")
 
 
 
52
  ```
53
 
54
+ ## πŸ“¦ Deployment
55
 
56
+ This model is deployed using a custom Docker image. See the [GitHub repository](https://github.com/logiroad/sam3) for full documentation and deployment instructions.
 
 
57
 
58
  ## πŸ“„ License
59
 
60
+ MIT License. This deployment uses Meta's SAM3 model - see the [facebook/sam3 model card](https://huggingface.co/facebook/sam3) for model license information.
 
 
61
 
62
+ ## πŸ”— Resources
 
 
 
63
 
64
+ - **Paper**: [SAM 3: Segment Anything with Concepts](https://ai.meta.com/research/publications/sam-3/)
65
+ - **Full Documentation**: [GitHub Repository](https://github.com/logiroad/sam3)
66
+ - **Endpoint Console**: [HuggingFace Endpoints](https://ui.endpoints.huggingface.co/Logiroad/endpoints/sam3-segmentation)