BladeSzaSza's picture
fix: define REPO_NAME in hf_upload.sh (ensure_blade_space referenced it)
4948993 verified
|
Raw
History Blame Contribute Delete
2.61 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

RECON.md

Phase 0 reconnaissance findings β€” model verification, Gradio APIs, access status. Updated: June 4, 2026.

Gradio

  • Version: TBD (will verify on first pip install gradio)
  • gr.Blocks: expected βœ“ (used in app.py skeleton)
  • gr.Video: expected βœ“
  • gr.Walkthrough / gr.Step: TBD (verify in Phase 2)
  • gr.Navbar: TBD (verify in Phase 2)
  • UI approach: gr.Blocks + custom CSS/theme (escalate to Server only if needed)

Python

  • Python 3.13.9 (local dev)
  • pytest 9.0.2, numpy, opencv-python installed

Model Verification

Model Params License GGUF ZeroGPU Status
YOLO26l-Pose (primary) 0.026B AGPL-3.0 n/a βœ“ (6.5ms T4) ready
YOLO26x-Pose (HQ alt) 0.058B AGPL-3.0 n/a βœ“ (12.2ms T4) ready
SAM 3.1 base (sam2.1_hiera_base_plus) ~0.85B SAM License n/a βœ“ access accepted
SAM 3D Body (facebook/sam-3d-body-dinov3) 0.84B (DINOv3-H+) SAM License n/a βœ“ INTEGRATED
Sapiens2 Pose (noahcao/sapiens-pose-coco) ~0.6B CC-BY-NC-4.0 n/a βœ“ access accepted
ST-GCN (pyskl) ~0.03B Apache-2.0 n/a βœ“ ready
Qwen3-VL-8B-Instruct 8B Apache-2.0 βœ“ llama.cpp ready
Qwen3-VL-Embedding-8B 8B Apache-2.0 βœ“ llama.cpp ready

Param Sum

~17.63B β€” well under 32B limit.

Gated Access Status (as of Jun 4, 2026)

  • SAM 3.1 (facebookresearch/sam3) β€” accepted
  • SAM 3D Body (facebook/sam-3d-body-dinov3) β€” ACCEPTED (confirmed Jun 4)
  • Sapiens2 Pose (noahcao/sapiens-pose-coco) β€” accepted

Open Questions

  • Confirm "≀32B" = summed vs per-model in Discord AMA
  • AGPL-3.0 YOLO OK for hackathon submission? (Likely yes for non-commercial demo)

llama.cpp Build Plan

  • CPU-only build first (avoids libcudart.so issues on Spaces)
  • Fallback: transformers + spaces.GPU for VLM inference
  • GGUF quantized Qwen3-VL-8B at Q4_K_M (~4.5GB)

Key Decisions

  • Primary pose: YOLO11x-Pose (fastest, well-tested)
  • Fallback pose: Sapiens2 (more keypoints, slower)
  • 3D body: INTEGRATED β€” uses setup_sam_3d_body() from notebook.utils, outputs MHR joints
    • API: estimator.process_one_image(rgb_image) β€” single RGB np.ndarray
    • Model variants: DINOv3-H+ (840M) default, ViT-H (631M) smaller
    • Temporal smoothing via EMA (alpha=0.3) to reduce single-frame jitter
    • config.enable_3d=False by default; flipped when checkpoint verified on Space
  • VLM: Qwen3-VL-8B via llama.cpp (Judge + Classifier)
  • Embeddings: Qwen3-VL-Embedding-8B via llama.cpp (Retrieval)