BladeSzaSza's picture
fix: define REPO_NAME in hf_upload.sh (ensure_blade_space referenced it)
4948993 verified
|
Raw
History Blame Contribute Delete
2.61 kB
# RECON.md
Phase 0 reconnaissance findings β€” model verification, Gradio APIs, access status.
Updated: June 4, 2026.
## Gradio
- Version: TBD (will verify on first `pip install gradio`)
- gr.Blocks: expected βœ“ (used in app.py skeleton)
- gr.Video: expected βœ“
- gr.Walkthrough / gr.Step: TBD (verify in Phase 2)
- gr.Navbar: TBD (verify in Phase 2)
- UI approach: gr.Blocks + custom CSS/theme (escalate to Server only if needed)
## Python
- Python 3.13.9 (local dev)
- pytest 9.0.2, numpy, opencv-python installed
## Model Verification
| Model | Params | License | GGUF | ZeroGPU | Status |
|---|---|---|---|---|---|
| YOLO26l-Pose (primary) | 0.026B | AGPL-3.0 | n/a | βœ“ (6.5ms T4) | ready |
| YOLO26x-Pose (HQ alt) | 0.058B | AGPL-3.0 | n/a | βœ“ (12.2ms T4) | ready |
| SAM 3.1 base (sam2.1_hiera_base_plus) | ~0.85B | SAM License | n/a | βœ“ | access accepted |
| SAM 3D Body (facebook/sam-3d-body-dinov3) | 0.84B (DINOv3-H+) | SAM License | n/a | βœ“ | **INTEGRATED** |
| Sapiens2 Pose (noahcao/sapiens-pose-coco) | ~0.6B | CC-BY-NC-4.0 | n/a | βœ“ | access accepted |
| ST-GCN (pyskl) | ~0.03B | Apache-2.0 | n/a | βœ“ | ready |
| Qwen3-VL-8B-Instruct | 8B | Apache-2.0 | βœ“ | llama.cpp | ready |
| Qwen3-VL-Embedding-8B | 8B | Apache-2.0 | βœ“ | llama.cpp | ready |
## Param Sum
~17.63B β€” well under 32B limit.
## Gated Access Status (as of Jun 4, 2026)
- [x] SAM 3.1 (facebookresearch/sam3) β€” accepted
- [x] SAM 3D Body (facebook/sam-3d-body-dinov3) β€” **ACCEPTED** (confirmed Jun 4)
- [x] Sapiens2 Pose (noahcao/sapiens-pose-coco) β€” accepted
## Open Questions
- [ ] Confirm "≀32B" = summed vs per-model in Discord AMA
- [ ] AGPL-3.0 YOLO OK for hackathon submission? (Likely yes for non-commercial demo)
## llama.cpp Build Plan
- CPU-only build first (avoids libcudart.so issues on Spaces)
- Fallback: transformers + spaces.GPU for VLM inference
- GGUF quantized Qwen3-VL-8B at Q4_K_M (~4.5GB)
## Key Decisions
- Primary pose: YOLO11x-Pose (fastest, well-tested)
- Fallback pose: Sapiens2 (more keypoints, slower)
- 3D body: INTEGRATED β€” uses `setup_sam_3d_body()` from `notebook.utils`, outputs MHR joints
- API: `estimator.process_one_image(rgb_image)` β€” single RGB np.ndarray
- Model variants: DINOv3-H+ (840M) default, ViT-H (631M) smaller
- Temporal smoothing via EMA (alpha=0.3) to reduce single-frame jitter
- config.enable_3d=False by default; flipped when checkpoint verified on Space
- VLM: Qwen3-VL-8B via llama.cpp (Judge + Classifier)
- Embeddings: Qwen3-VL-Embedding-8B via llama.cpp (Retrieval)