Spaces:
Running on Zero
Running on Zero
| # RECON.md | |
| Phase 0 reconnaissance findings β model verification, Gradio APIs, access status. | |
| Updated: June 4, 2026. | |
| ## Gradio | |
| - Version: TBD (will verify on first `pip install gradio`) | |
| - gr.Blocks: expected β (used in app.py skeleton) | |
| - gr.Video: expected β | |
| - gr.Walkthrough / gr.Step: TBD (verify in Phase 2) | |
| - gr.Navbar: TBD (verify in Phase 2) | |
| - UI approach: gr.Blocks + custom CSS/theme (escalate to Server only if needed) | |
| ## Python | |
| - Python 3.13.9 (local dev) | |
| - pytest 9.0.2, numpy, opencv-python installed | |
| ## Model Verification | |
| | Model | Params | License | GGUF | ZeroGPU | Status | | |
| |---|---|---|---|---|---| | |
| | YOLO26l-Pose (primary) | 0.026B | AGPL-3.0 | n/a | β (6.5ms T4) | ready | | |
| | YOLO26x-Pose (HQ alt) | 0.058B | AGPL-3.0 | n/a | β (12.2ms T4) | ready | | |
| | SAM 3.1 base (sam2.1_hiera_base_plus) | ~0.85B | SAM License | n/a | β | access accepted | | |
| | SAM 3D Body (facebook/sam-3d-body-dinov3) | 0.84B (DINOv3-H+) | SAM License | n/a | β | **INTEGRATED** | | |
| | Sapiens2 Pose (noahcao/sapiens-pose-coco) | ~0.6B | CC-BY-NC-4.0 | n/a | β | access accepted | | |
| | ST-GCN (pyskl) | ~0.03B | Apache-2.0 | n/a | β | ready | | |
| | Qwen3-VL-8B-Instruct | 8B | Apache-2.0 | β | llama.cpp | ready | | |
| | Qwen3-VL-Embedding-8B | 8B | Apache-2.0 | β | llama.cpp | ready | | |
| ## Param Sum | |
| ~17.63B β well under 32B limit. | |
| ## Gated Access Status (as of Jun 4, 2026) | |
| - [x] SAM 3.1 (facebookresearch/sam3) β accepted | |
| - [x] SAM 3D Body (facebook/sam-3d-body-dinov3) β **ACCEPTED** (confirmed Jun 4) | |
| - [x] Sapiens2 Pose (noahcao/sapiens-pose-coco) β accepted | |
| ## Open Questions | |
| - [ ] Confirm "β€32B" = summed vs per-model in Discord AMA | |
| - [ ] AGPL-3.0 YOLO OK for hackathon submission? (Likely yes for non-commercial demo) | |
| ## llama.cpp Build Plan | |
| - CPU-only build first (avoids libcudart.so issues on Spaces) | |
| - Fallback: transformers + spaces.GPU for VLM inference | |
| - GGUF quantized Qwen3-VL-8B at Q4_K_M (~4.5GB) | |
| ## Key Decisions | |
| - Primary pose: YOLO11x-Pose (fastest, well-tested) | |
| - Fallback pose: Sapiens2 (more keypoints, slower) | |
| - 3D body: INTEGRATED β uses `setup_sam_3d_body()` from `notebook.utils`, outputs MHR joints | |
| - API: `estimator.process_one_image(rgb_image)` β single RGB np.ndarray | |
| - Model variants: DINOv3-H+ (840M) default, ViT-H (631M) smaller | |
| - Temporal smoothing via EMA (alpha=0.3) to reduce single-frame jitter | |
| - config.enable_3d=False by default; flipped when checkpoint verified on Space | |
| - VLM: Qwen3-VL-8B via llama.cpp (Judge + Classifier) | |
| - Embeddings: Qwen3-VL-Embedding-8B via llama.cpp (Retrieval) | |