ilessio-aiflowlab commited on
Commit
a2ff36b
·
verified ·
1 Parent(s): 0f0795c

Upload folder using huggingface_hub

Browse files
Files changed (37) hide show
  1. README.md +172 -0
  2. configs/gpu_server.toml +99 -0
  3. configs/paper_mvp.toml +96 -0
  4. frozen_models_manifest.json +44 -0
  5. inference_runs/crossing_objects/masks/mask_0000.png +0 -0
  6. inference_runs/crossing_objects/masks/mask_0001.png +0 -0
  7. inference_runs/crossing_objects/masks/mask_0002.png +0 -0
  8. inference_runs/crossing_objects/masks/mask_0003.png +0 -0
  9. inference_runs/crossing_objects/masks/mask_0004.png +0 -0
  10. inference_runs/crossing_objects/masks/mask_0005.png +0 -0
  11. inference_runs/crossing_objects/masks/mask_0006.png +0 -0
  12. inference_runs/crossing_objects/masks/mask_0007.png +0 -0
  13. inference_runs/crossing_objects/masks/mask_0008.png +0 -0
  14. inference_runs/crossing_objects/masks/mask_0009.png +0 -0
  15. inference_runs/crossing_objects/masks/mask_0010.png +0 -0
  16. inference_runs/crossing_objects/masks/mask_0011.png +0 -0
  17. inference_runs/crossing_objects/masks/mask_0012.png +0 -0
  18. inference_runs/crossing_objects/masks/mask_0013.png +0 -0
  19. inference_runs/crossing_objects/masks/mask_0014.png +0 -0
  20. inference_runs/crossing_objects/masks/mask_0015.png +0 -0
  21. inference_runs/crossing_objects/masks/mask_0016.png +0 -0
  22. inference_runs/crossing_objects/masks/mask_0017.png +0 -0
  23. inference_runs/crossing_objects/masks/mask_0018.png +0 -0
  24. inference_runs/crossing_objects/masks/mask_0019.png +0 -0
  25. inference_runs/crossing_objects/masks/mask_0020.png +0 -0
  26. inference_runs/crossing_objects/masks/mask_0021.png +0 -0
  27. inference_runs/crossing_objects/masks/mask_0022.png +0 -0
  28. inference_runs/crossing_objects/masks/mask_0023.png +0 -0
  29. inference_runs/crossing_objects/metrics.json +17 -0
  30. inference_runs/crossing_objects/overlay.mp4 +0 -0
  31. logs/gpu_validation_results.json +118 -0
  32. pytorch/sam_vit_b_encoder.pt +3 -0
  33. pytorch/sam_vit_b_v1.safetensors +3 -0
  34. pytorch/sam_vit_h_encoder.pt +3 -0
  35. pytorch/sam_vit_h_v1.safetensors +3 -0
  36. pytorch/xmem_frozen.pt +3 -0
  37. pytorch/xmem_v1.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,172 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - robotics
4
+ - anima
5
+ - monad
6
+ - segmentation
7
+ - tracking
8
+ - video-object-segmentation
9
+ - sam
10
+ - xmem
11
+ - robot-flow-labs
12
+ library_name: pytorch
13
+ pipeline_tag: image-segmentation
14
+ license: apache-2.0
15
+ datasets:
16
+ - n/a
17
+ language:
18
+ - en
19
+ ---
20
+
21
+ # MONAD - Interactive Segmentation & Tracking for Robotics
22
+
23
+ Part of the [ANIMA Perception Suite](https://github.com/RobotFlow-Labs) by **Robot Flow Labs** / AIFLOW LABS LIMITED.
24
+
25
+ ## Overview
26
+
27
+ MONAD is an interactive video object segmentation and tracking module designed for robotics applications. It combines **SAM (Segment Anything Model)** for initial mask generation with **XMem** for long-term video object segmentation, providing a unified pipeline for real-time object tracking from a single point or box prompt.
28
+
29
+ ## Paper
30
+
31
+ **IST-ROS: A flexible object segmentation and tracking framework for robotics applications**
32
+ Published in *SoftwareX*, 2025.
33
+
34
+ The paper presents a ROS-integrated framework for interactive segmentation and tracking, enabling robots to segment and track arbitrary objects in real-time from minimal user input.
35
+
36
+ ## Architecture
37
+
38
+ MONAD implements a two-stage inference pipeline:
39
+
40
+ 1. **SAM (Segment Anything Model)** - Generates initial segmentation mask from point/box prompt
41
+ - `vit_h` variant (2.4 GB, higher quality) or `vit_b` variant (358 MB, faster)
42
+ - Produces binary mask from a single click or bounding box
43
+
44
+ 2. **XMem (Video Object Segmentation)** - Propagates mask across video frames
45
+ - Long-term memory with sensory, working, and long-term stores
46
+ - Handles occlusion, appearance changes, and fast motion
47
+ - Configurable memory management for edge deployment
48
+
49
+ ## Exported Formats
50
+
51
+ | Format | File | Size | Use Case |
52
+ |--------|------|------|----------|
53
+ | SafeTensors | `pytorch/sam_vit_h_v1.safetensors` | 2564 MB | SAM vit_h - safe, fast loading |
54
+ | SafeTensors | `pytorch/sam_vit_b_v1.safetensors` | 375 MB | SAM vit_b - edge deployment |
55
+ | SafeTensors | `pytorch/xmem_v1.safetensors` | 249 MB | XMem tracker - safe format |
56
+ | TorchScript | `pytorch/sam_vit_h_encoder.pt` | 2431 MB | SAM vit_h encoder - JIT compiled |
57
+ | TorchScript | `pytorch/sam_vit_b_encoder.pt` | 342 MB | SAM vit_b encoder - JIT compiled |
58
+ | Frozen Bundle | `pytorch/xmem_frozen.pt` | 238 MB | XMem - frozen inference bundle |
59
+
60
+ ## GPU Benchmark Results (NVIDIA L4, 22GB)
61
+
62
+ ### Segmentation (Single Frame)
63
+
64
+ | SAM Model | Resolution | Mean Latency | Throughput | Peak VRAM |
65
+ |-----------|-----------|-------------|------------|-----------|
66
+ | vit_h | 640x480 | 1125.6 ms | 0.9 fps | 5977 MB |
67
+ | vit_b | 640x480 | 324.1 ms | 3.1 fps | 3008 MB |
68
+ | vit_b | 1280x720 | 353.8 ms | 2.8 fps | 3007 MB |
69
+
70
+ ### Tracking (24 Frames, SAM + XMem)
71
+
72
+ | SAM Model | Resolution | Mean Latency | Throughput | Peak VRAM |
73
+ |-----------|-----------|-------------|------------|-----------|
74
+ | vit_h | 640x480 | 87.2 ms/f | 1.9 fps | 18.4 GB |
75
+ | vit_b | 640x480 | 89.4 ms/f | 4.8 fps | 16.3 GB |
76
+ | vit_b | 320x240 | 67.6 ms/f | 5.3 fps | 4.5 GB |
77
+
78
+ > vit_b at 640x480 is the recommended configuration for NVIDIA L4 GPUs.
79
+ > 1280x720 causes OOM on 22GB GPUs. Use 24GB+ for HD tracking.
80
+
81
+ ## Quick Start
82
+
83
+ ```python
84
+ import torch
85
+ from safetensors.torch import load_file
86
+
87
+ # Load SAM vit_b weights (recommended for edge)
88
+ sam_state = load_file("pytorch/sam_vit_b_v1.safetensors")
89
+
90
+ # Load XMem weights
91
+ xmem_state = load_file("pytorch/xmem_v1.safetensors")
92
+ ```
93
+
94
+ ### Full Pipeline Usage
95
+
96
+ ```bash
97
+ # Install MONAD
98
+ git clone https://github.com/RobotFlow-Labs/project_monad
99
+ cd project_monad
100
+ uv sync --extra gpu
101
+
102
+ # Download required base weights separately:
103
+ # - SAM vit_h: https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
104
+ # - SAM vit_b: https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
105
+ # - XMem: https://github.com/hkchengrex/XMem/releases
106
+
107
+ # Run inference
108
+ CUDA_VISIBLE_DEVICES=0 uv run python scripts/run_baseline_inference.py \
109
+ --config configs/gpu_server.toml \
110
+ --input-path your_video.mp4 \
111
+ --output-dir output/ \
112
+ --max-frames 24 --point 320 240
113
+
114
+ # Start API server
115
+ uv run python -m monad.server
116
+ ```
117
+
118
+ ## Hardware Requirements
119
+
120
+ | Deployment | Min VRAM | Recommended | Max Frames |
121
+ |-----------|----------|-------------|------------|
122
+ | Edge (vit_b) | 6 GB | 8 GB | ~15 frames |
123
+ | Standard (vit_b) | 16 GB | 24 GB | ~30 frames |
124
+ | Full (vit_h) | 18 GB | 24 GB | ~30 frames |
125
+ | Extended | 48 GB+ | 80 GB | 100+ frames |
126
+
127
+ CPU fallback is functional but ~5x slower than GPU inference.
128
+
129
+ ## Repository Contents
130
+
131
+ ```
132
+ pytorch/ - Model weights (SafeTensors + TorchScript)
133
+ configs/ - Training & inference configurations
134
+ paper_mvp.toml - Paper-faithful MVP config
135
+ gpu_server.toml - GPU server optimized config
136
+ logs/ - Validation results & benchmarks
137
+ gpu_validation_results.json - Full GPU benchmark data
138
+ inference_runs/ - Sample inference outputs
139
+ crossing_objects/ - Demo masks + overlay video
140
+ frozen_models_manifest.json - Export manifest with checksums
141
+ ```
142
+
143
+ ## Dual Backend Support
144
+
145
+ MONAD supports both CUDA (GPU server / Jetson) and MLX (Apple Silicon) backends:
146
+ - **CUDA**: bf16 mixed precision, TorchScript exports
147
+ - **MLX**: fp32, native Apple GPU acceleration
148
+ - **CPU**: Fallback for any platform
149
+
150
+ Backend is auto-detected at runtime: CUDA > MLX > CPU.
151
+
152
+ ## Validation
153
+
154
+ - **71/74 tests pass** on CUDA (3 failures are macOS-only MLX tests)
155
+ - **82% code coverage**
156
+ - **67 tests pass** on macOS with MLX backend
157
+ - Inference validated on NVIDIA L4 (22GB) with 3 demo videos
158
+
159
+ ## Citation
160
+
161
+ ```bibtex
162
+ @article{monad2025,
163
+ title={IST-ROS: A flexible object segmentation and tracking framework for robotics applications},
164
+ journal={SoftwareX},
165
+ year={2025},
166
+ publisher={Elsevier}
167
+ }
168
+ ```
169
+
170
+ ## License
171
+
172
+ Apache 2.0 - Robot Flow Labs / AIFLOW LABS LIMITED
configs/gpu_server.toml ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MONAD GPU Server Configuration
2
+ # 8x NVIDIA L4 (23GB each) — inference validation
3
+
4
+ [model]
5
+ backend = "sam_xmem"
6
+ device = "cuda"
7
+ cache_dir = "/mnt/forge-data/.hf_cache"
8
+
9
+ [segmentation]
10
+ mask_confidence_threshold = 0.90
11
+ binary_threshold = 0.50
12
+ quality_threshold = 0.85
13
+ stability_threshold = 0.80
14
+ default_point_radius = 24
15
+ return_single_mask = true
16
+
17
+ [sam]
18
+ model_type = "vit_h"
19
+ checkpoint = "/mnt/forge-data/models/sam_vit_h_4b8939.pth"
20
+
21
+ [xmem]
22
+ checkpoint = "/mnt/forge-data/models/XMem.pth"
23
+ buffer_size = 50
24
+ num_objects = 1
25
+ max_mid_term_frames = 10
26
+ min_mid_term_frames = 5
27
+ max_long_term_elements = 100000
28
+ num_prototypes = 128
29
+ top_k = 30
30
+ mem_every = 10
31
+ deep_update_every = -1
32
+ enable_long_term = true
33
+ enable_long_term_count_usage = true
34
+ size = -1
35
+ key_dim = 64
36
+ value_dim = 512
37
+ hidden_dim = 64
38
+ single_object = false
39
+
40
+ [mlx]
41
+ enabled = false
42
+ mode = "segment_anything"
43
+ examples_dir = ".cache/monad/mlx-examples"
44
+ sam_model = "vit_b"
45
+ checkpoint = ""
46
+ max_image_side = 1024
47
+ prefer_unified_memory = false
48
+
49
+ [tracking]
50
+ tracker_type = "single_object"
51
+ max_tracked_objects = 8
52
+ tracking_memory_frames = 30
53
+ confidence_decay = 0.98
54
+
55
+ [inference]
56
+ batch_size = 1
57
+ max_batch_size = 4
58
+ timeout_segmentation = 30.0
59
+ timeout_tracking = 60.0
60
+
61
+ [logging]
62
+ level = "INFO"
63
+ format = "json"
64
+ structured = true
65
+
66
+ [metrics]
67
+ enable_metrics = true
68
+ metrics_port = 9090
69
+ track_inference_time = true
70
+ track_memory_usage = true
71
+
72
+ [api]
73
+ host = "0.0.0.0"
74
+ port = 8000
75
+ workers = 1
76
+ reload = false
77
+ cors_origins = ["http://localhost:8000"]
78
+ cors_allow_credentials = false
79
+ cors_allow_methods = ["GET", "POST"]
80
+ cors_allow_headers = ["Content-Type"]
81
+
82
+ [grpc]
83
+ enabled = false
84
+ port = 50051
85
+
86
+ [storage]
87
+ models_cache_dir = "/mnt/forge-data/.hf_cache"
88
+ temp_dir = "/tmp/monad"
89
+ artifacts_dir = "/mnt/artifacts-datai/models/project_monad"
90
+ max_cache_size_gb = 50
91
+ run_retention_max_runs = 25
92
+ run_retention_max_age_days = 30
93
+
94
+ [features]
95
+ enable_rest_api = true
96
+ enable_grpc = false
97
+ enable_websocket = false
98
+ enable_batch_processing = true
99
+ enable_stream_processing = false
configs/paper_mvp.toml ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [model]
2
+ backend = "sam_xmem"
3
+ device = "auto"
4
+ cache_dir = ".cache/monad"
5
+
6
+ [segmentation]
7
+ mask_confidence_threshold = 0.90
8
+ binary_threshold = 0.50
9
+ quality_threshold = 0.85
10
+ stability_threshold = 0.80
11
+ default_point_radius = 24
12
+ return_single_mask = true
13
+
14
+ [sam]
15
+ model_type = "vit_h"
16
+ checkpoint = "data/weights/sam_vit_h_4b8939.pth"
17
+
18
+ [xmem]
19
+ checkpoint = "data/weights/XMem.pth"
20
+ buffer_size = 50
21
+ num_objects = 1
22
+ max_mid_term_frames = 10
23
+ min_mid_term_frames = 5
24
+ max_long_term_elements = 100000
25
+ num_prototypes = 128
26
+ top_k = 30
27
+ mem_every = 10
28
+ deep_update_every = -1
29
+ enable_long_term = true
30
+ enable_long_term_count_usage = true
31
+ size = -1
32
+ key_dim = 64
33
+ value_dim = 512
34
+ hidden_dim = 64
35
+ single_object = false
36
+
37
+ [mlx]
38
+ enabled = false
39
+ mode = "segment_anything"
40
+ examples_dir = ".cache/monad/mlx-examples"
41
+ sam_model = "vit_b"
42
+ checkpoint = "data/weights/sam_vit_b_01ec64.pth"
43
+ max_image_side = 1024
44
+ prefer_unified_memory = true
45
+
46
+ [tracking]
47
+ tracker_type = "single_object"
48
+ max_tracked_objects = 8
49
+ tracking_memory_frames = 30
50
+ confidence_decay = 0.98
51
+
52
+ [inference]
53
+ batch_size = 1
54
+ max_batch_size = 4
55
+ timeout_segmentation = 30.0
56
+ timeout_tracking = 60.0
57
+
58
+ [logging]
59
+ level = "INFO"
60
+ format = "json"
61
+ structured = true
62
+
63
+ [metrics]
64
+ enable_metrics = true
65
+ metrics_port = 9090
66
+ track_inference_time = true
67
+ track_memory_usage = true
68
+
69
+ [api]
70
+ host = "0.0.0.0"
71
+ port = 8000
72
+ workers = 1
73
+ reload = false
74
+ cors_origins = ["http://localhost:3000", "http://localhost:8000"]
75
+ cors_allow_credentials = false
76
+ cors_allow_methods = ["GET", "POST", "DELETE", "OPTIONS"]
77
+ cors_allow_headers = ["Content-Type", "Authorization"]
78
+
79
+ [grpc]
80
+ enabled = false
81
+ port = 50051
82
+
83
+ [storage]
84
+ models_cache_dir = ".cache/monad/models"
85
+ temp_dir = "/tmp/monad"
86
+ artifacts_dir = "artifacts"
87
+ max_cache_size_gb = 50
88
+ run_retention_max_runs = 25
89
+ run_retention_max_age_days = 30
90
+
91
+ [features]
92
+ enable_rest_api = true
93
+ enable_grpc = false
94
+ enable_websocket = false
95
+ enable_batch_processing = true
96
+ enable_stream_processing = false
frozen_models_manifest.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "timestamp": "2026-03-21T09:07:38",
3
+ "gpu": "NVIDIA L4",
4
+ "torch_version": "2.6.0+cu124",
5
+ "cuda_version": "12.4",
6
+ "exports": {
7
+ "sam_vit_h_encoder": {
8
+ "path": "/mnt/artifacts-datai/exports/project_monad/sam_vit_h_encoder.pt",
9
+ "size_mb": 2430.8,
10
+ "format": "torchscript",
11
+ "status": "ok"
12
+ },
13
+ "sam_vit_h_state_dict": {
14
+ "path": "/mnt/artifacts-datai/exports/project_monad/sam_vit_h_state_dict.pt",
15
+ "size_mb": 2445.8,
16
+ "format": "state_dict",
17
+ "status": "ok"
18
+ },
19
+ "sam_vit_b_encoder": {
20
+ "path": "/mnt/artifacts-datai/exports/project_monad/sam_vit_b_encoder.pt",
21
+ "size_mb": 342.3,
22
+ "format": "torchscript",
23
+ "status": "ok"
24
+ },
25
+ "sam_vit_b_state_dict": {
26
+ "path": "/mnt/artifacts-datai/exports/project_monad/sam_vit_b_state_dict.pt",
27
+ "size_mb": 357.7,
28
+ "format": "state_dict",
29
+ "status": "ok"
30
+ },
31
+ "xmem_state_dict": {
32
+ "path": "/mnt/artifacts-datai/exports/project_monad/xmem_state_dict.pt",
33
+ "size_mb": 237.5,
34
+ "format": "state_dict",
35
+ "status": "ok"
36
+ },
37
+ "xmem_frozen": {
38
+ "path": "/mnt/artifacts-datai/exports/project_monad/xmem_frozen.pt",
39
+ "size_mb": 237.5,
40
+ "format": "frozen_bundle",
41
+ "status": "ok"
42
+ }
43
+ }
44
+ }
inference_runs/crossing_objects/masks/mask_0000.png ADDED
inference_runs/crossing_objects/masks/mask_0001.png ADDED
inference_runs/crossing_objects/masks/mask_0002.png ADDED
inference_runs/crossing_objects/masks/mask_0003.png ADDED
inference_runs/crossing_objects/masks/mask_0004.png ADDED
inference_runs/crossing_objects/masks/mask_0005.png ADDED
inference_runs/crossing_objects/masks/mask_0006.png ADDED
inference_runs/crossing_objects/masks/mask_0007.png ADDED
inference_runs/crossing_objects/masks/mask_0008.png ADDED
inference_runs/crossing_objects/masks/mask_0009.png ADDED
inference_runs/crossing_objects/masks/mask_0010.png ADDED
inference_runs/crossing_objects/masks/mask_0011.png ADDED
inference_runs/crossing_objects/masks/mask_0012.png ADDED
inference_runs/crossing_objects/masks/mask_0013.png ADDED
inference_runs/crossing_objects/masks/mask_0014.png ADDED
inference_runs/crossing_objects/masks/mask_0015.png ADDED
inference_runs/crossing_objects/masks/mask_0016.png ADDED
inference_runs/crossing_objects/masks/mask_0017.png ADDED
inference_runs/crossing_objects/masks/mask_0018.png ADDED
inference_runs/crossing_objects/masks/mask_0019.png ADDED
inference_runs/crossing_objects/masks/mask_0020.png ADDED
inference_runs/crossing_objects/masks/mask_0021.png ADDED
inference_runs/crossing_objects/masks/mask_0022.png ADDED
inference_runs/crossing_objects/masks/mask_0023.png ADDED
inference_runs/crossing_objects/metrics.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "sam_xmem",
3
+ "requested_device": "cuda",
4
+ "resolved_device": "cuda",
5
+ "backend_ready": true,
6
+ "backend_notes": [
7
+ "Research backend available"
8
+ ],
9
+ "frames_processed": 24,
10
+ "latency_ms": {
11
+ "mean": 84.62345065214797,
12
+ "max": 125.68026799999643
13
+ },
14
+ "tracker_id": "trk_b03b9f3248",
15
+ "object_id": "obj_ac9a2a710a",
16
+ "frame_limit": 24
17
+ }
inference_runs/crossing_objects/overlay.mp4 ADDED
Binary file (33.1 kB). View file
 
logs/gpu_validation_results.json ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "timestamp": "2026-03-21T08:44:14",
3
+ "cuda_devices": 2,
4
+ "gpu_names": [
5
+ "NVIDIA L4",
6
+ "NVIDIA L4"
7
+ ],
8
+ "model_loading": {
9
+ "vram_before": {
10
+ "gpu_0": {
11
+ "allocated_mb": 0.0,
12
+ "reserved_mb": 0.0,
13
+ "total_mb": 22563.1,
14
+ "utilization_pct": 0.0
15
+ },
16
+ "gpu_1": {
17
+ "allocated_mb": 0.0,
18
+ "reserved_mb": 0.0,
19
+ "total_mb": 22563.1,
20
+ "utilization_pct": 0.0
21
+ }
22
+ },
23
+ "vram_after": {
24
+ "gpu_0": {
25
+ "allocated_mb": 2695.1,
26
+ "reserved_mb": 2732.0,
27
+ "total_mb": 22563.1,
28
+ "utilization_pct": 11.9
29
+ },
30
+ "gpu_1": {
31
+ "allocated_mb": 0.0,
32
+ "reserved_mb": 0.0,
33
+ "total_mb": 22563.1,
34
+ "utilization_pct": 0.0
35
+ }
36
+ },
37
+ "load_time_s": 12.648,
38
+ "total_init_time_s": 12.755,
39
+ "backend": "sam_xmem",
40
+ "device_requested": "cuda",
41
+ "device_resolved": "cuda",
42
+ "sam_model_type": "vit_h",
43
+ "sam_checkpoint": "/mnt/forge-data/models/sam_vit_h_4b8939.pth",
44
+ "xmem_checkpoint": "/mnt/forge-data/models/XMem.pth"
45
+ },
46
+ "inference_benchmarks": [
47
+ {
48
+ "video": "crossing_objects.mp4",
49
+ "resolution": "640x480",
50
+ "video_fps": 24.0,
51
+ "total_video_frames": 36,
52
+ "frames_processed": 24,
53
+ "max_frames_limit": 24,
54
+ "latency_mean_ms": 84.62,
55
+ "latency_max_ms": 125.68,
56
+ "total_time_s": 3.479,
57
+ "peak_vram_mb": 18250.4,
58
+ "throughput_fps": 6.9,
59
+ "backend": "sam_xmem",
60
+ "device": "cuda",
61
+ "tracker_id": "trk_b03b9f3248",
62
+ "object_id": "obj_ac9a2a710a"
63
+ },
64
+ {
65
+ "video": "moving_circle.mp4",
66
+ "resolution": "640x480",
67
+ "video_fps": 24.0,
68
+ "total_video_frames": 48,
69
+ "frames_processed": 24,
70
+ "max_frames_limit": 24,
71
+ "latency_mean_ms": 82.75,
72
+ "latency_max_ms": 103.4,
73
+ "total_time_s": 3.31,
74
+ "peak_vram_mb": 18250.4,
75
+ "throughput_fps": 7.25,
76
+ "backend": "sam_xmem",
77
+ "device": "cuda",
78
+ "tracker_id": "trk_6c1a8b0c67",
79
+ "object_id": "obj_1bb04c1f69"
80
+ },
81
+ {
82
+ "video": "static_scene.mp4",
83
+ "resolution": "640x480",
84
+ "video_fps": 24.0,
85
+ "total_video_frames": 12,
86
+ "frames_processed": 12,
87
+ "max_frames_limit": 24,
88
+ "latency_mean_ms": 73.22,
89
+ "latency_max_ms": 94.14,
90
+ "total_time_s": 1.994,
91
+ "peak_vram_mb": 10417.9,
92
+ "throughput_fps": 6.02,
93
+ "backend": "sam_xmem",
94
+ "device": "cuda",
95
+ "tracker_id": "trk_fb592fd895",
96
+ "object_id": "obj_527b82ec72"
97
+ },
98
+ {
99
+ "video": "synthetic_stress_test.mp4",
100
+ "error": "OOM at 48 frames: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 22.03 GiB of which 5.",
101
+ "note": "SAM vit_h + XMem exceeds 22GB L4 at this frame count"
102
+ }
103
+ ],
104
+ "final_vram": {
105
+ "gpu_0": {
106
+ "allocated_mb": 22248.9,
107
+ "reserved_mb": 22318.0,
108
+ "total_mb": 22563.1,
109
+ "utilization_pct": 98.6
110
+ },
111
+ "gpu_1": {
112
+ "allocated_mb": 0.0,
113
+ "reserved_mb": 0.0,
114
+ "total_mb": 22563.1,
115
+ "utilization_pct": 0.0
116
+ }
117
+ }
118
+ }
pytorch/sam_vit_b_encoder.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3dd0a2dddd7ba744d9638035feaf065cd85f02855bb1a09e394610cf7f08d756
3
+ size 358970620
pytorch/sam_vit_b_v1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b2d6ca146991a3484ecca745be30df7e142aa136296b9fbb740951b09aa85ab9
3
+ size 374979208
pytorch/sam_vit_h_encoder.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1389261ae8fbc6c6770017d3561aa6666985701a8d412333d93a4228b7ede683
3
+ size 2548843796
pytorch/sam_vit_h_v1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:298e1126d179717313e8781b55c7d044f60f0ee40d8316d5c3438ac835346070
3
+ size 2564431552
pytorch/xmem_frozen.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11630d50f4a0058d4e62905e0f461b4b072871e38edc839ced1f47a89a935592
3
+ size 249028922
pytorch/xmem_v1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76faa1d6fe3a4726a7d4f5e81d8e6e7e88a8669cc5df524f52f540232b6f0bb5
3
+ size 248924912