Spaces:

build-small-hackathon
/

Pozify

Running on Zero

Pozify / docs /02-technical-setup.md

refactor: streamline evidence generation in coach summary; enhance JSON output contract and improve tokenization handling for local chat

39810d5 11 days ago

preview code

Raw

History Blame Contribute Delete

7.43 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

Pozify Technical Setup And Runtime

This document holds the command-heavy setup, runtime, training, and verification notes for Pozify. The main README stays focused on the project, model strategy, and product story.

Run The App Locally

This repo uses a src/ layout, but uv is configured with package = false.

uv sync
uv run python app.py

Then open http://127.0.0.1:7860.

Mock vs Real Mode

By default:

if no video is provided, Pozify uses mock mode
if a real video is uploaded, Pozify runs the full analysis pipeline

Force mock mode:

POZIFY_MOCK_MODE=1 uv run python app.py

Force real mode:

POZIFY_MOCK_MODE=0 uv run python app.py

If you already have the MediaPipe task file locally:

POZIFY_MEDIAPIPE_POSE_MODEL=/path/to/pose_landmarker_lite.task \
POZIFY_MOCK_MODE=0 \
uv run python app.py

Coach Summary Runtime Options

1. Fine-tuned coach model

The app defaults to the fine-tuned coach-summary model:

export POZIFY_COACH_SUMMARY_MODEL=build-small-hackathon/pozify-coach-summary1
uv run python app.py

Pozify tries chat_completion first and falls back to text_generation when Hugging Face reports that the repo is not a chat model. The deterministic fallback summary remains enabled if hosted inference is unavailable or the model output fails validation.

For regular Hugging Face Spaces, keep the provider on hosted inference unless you have a dedicated local model runtime:

POZIFY_COACH_SUMMARY_PROVIDER=hf_inference
POZIFY_COACH_SUMMARY_MODEL=build-small-hackathon/pozify-coach-summary1

For Hugging Face ZeroGPU Spaces, local Transformers is selected automatically so the app does not call the hosted Hugging Face Inference API. You can also set it explicitly:

POZIFY_COACH_SUMMARY_PROVIDER=local_transformers
POZIFY_COACH_SUMMARY_MODEL=nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
POZIFY_SPACES_GPU_DURATION=300

HF_TOKEN is only needed for hf_inference or for downloading a private/gated local model repo. Pozify uses the Nemotron implementation bundled with Transformers instead of downloading remote model code. If fast Mamba kernels are unavailable at runtime, Pozify caps the local prompt context before generation to avoid the slow naive Mamba path crashing CUDA.

2. Use the fine-tuned merged model locally

Download the merged repo locally, then point Pozify at it:

export POZIFY_COACH_SUMMARY_LOCAL_MODEL_DIR=/absolute/path/to/merged_model
export POZIFY_COACH_SUMMARY_BASE_MODEL=nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
export POZIFY_COACH_SUMMARY_ADAPTER_ID=build-small-hackathon/pozify-coach-summary1
uv run python app.py

This is the simplest way to use build-small-hackathon/pozify-coach-summary1 today without adding a dedicated inference endpoint.

3. Base cloud model override

If you need the Nemotron base-model runtime:

export POZIFY_COACH_SUMMARY_MODEL=nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
uv run python app.py

4. llama.cpp

Pozify can send the coach-summary prompt to a local llama-server that exposes the OpenAI-compatible /v1/chat/completions endpoint.

Example:

llama-server \
  --model /path/to/nemotron-3-nano-4b.gguf \
  --ctx-size 4096 \
  --n-gpu-layers 99 \
  --host 127.0.0.1 \
  --port 8080

Then:

POZIFY_COACH_SUMMARY_PROVIDER=llama_cpp \
POZIFY_COACH_SUMMARY_MODEL=local-nemotron-3-nano-4b-gguf \
POZIFY_LLAMA_CPP_BASE_URL=http://127.0.0.1:8080 \
POZIFY_COACH_SUMMARY_MAX_TOKENS=700 \
uv run python app.py

Useful Environment Variables

Variable	Purpose
`POZIFY_ROUTER_DEVICE`	Override router device, for example `cpu` or `cuda`.
`POZIFY_SPACES_GPU_DURATION`	`spaces.GPU` duration in seconds, default `120`.
`POZIFY_COACH_SUMMARY_PROVIDER`	`hf_inference`, `local_transformers`, or `llama_cpp`.
`POZIFY_COACH_SUMMARY_MODEL`	Coach model id or llama.cpp model alias.
`POZIFY_COACH_SUMMARY_LOCAL_MODEL_DIR`	Prefer a local merged/model directory for coach summary.
`POZIFY_COACH_SUMMARY_MAX_INPUT_TOKENS`	Max local Transformers prompt tokens, default `2048`.
`POZIFY_COACH_SUMMARY_BYPASS_VERIFIER`	Keep model output even when verifier fails.

Exercise Router Training

Run the full router training and publish flow:

uv run modal run scripts/exercise_router_modal.py \
  --stage all \
  --repo-id build-small-hackathon/pozify-exercise-router

Step-by-step:

uv run modal run scripts/exercise_router_modal.py --stage ingest
uv run modal run scripts/exercise_router_modal.py --stage features
uv run modal run scripts/exercise_router_modal.py --stage train-baseline
uv run modal run scripts/exercise_router_modal.py --stage train-temporal
uv run modal run scripts/exercise_router_modal.py --stage evaluate
uv run modal run scripts/exercise_router_modal.py --stage publish --repo-id build-small-hackathon/pozify-exercise-router

The active router artifact is temporal.pt; the baseline is retained for comparison and fallback.

Coach Summary Training

Build the grounded SFT dataset:

uv run python scripts/build_coach_summary_sft_dataset.py

Run the full coach-summary Modal flow:

uv run modal run scripts/coach_summary_modal.py \
  --stage all \
  --epochs 2 \
  --style-weight 0.2 \
  --repo-id build-small-hackathon/pozify-coach-summary1

The checked-in fine-tune config uses nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16 as the base model. The Modal training, evaluation, and merge stages request an A100-80GB GPU because the Nemotron base model can run out of CUDA memory on the previous A10G setting.

Step-by-step:

uv run modal run scripts/coach_summary_modal.py --stage prepare-data
uv run modal run scripts/coach_summary_modal.py --stage train --epochs 2 --style-weight 0.2
uv run modal run scripts/coach_summary_modal.py --stage evaluate --limit 5
uv run modal run scripts/coach_summary_modal.py --stage merge
uv run modal run scripts/coach_summary_modal.py --stage publish-merged --repo-id build-small-hackathon/pozify-coach-summary1

Important runtime note:

the default coach model is build-small-hackathon/pozify-coach-summary1
Hugging Face hosted inference may still reject a repo or produce invalid JSON, so the conservative fallback summary stays enabled
for the most predictable fine-tuned inference path, use POZIFY_COACH_SUMMARY_LOCAL_MODEL_DIR

Generated Artifacts

Each run creates runs/<run_id>/ with:

manifest.json
user_profile.json
video_manifest.json
pose_sequence.json
exercise_classification.json
reps.json
rep_debug.json
rep_analysis.json
variation.json
issue_markers.json
annotated_video.mp4
coach_summary.json
verification.json
final_report.json

JSON artifacts are validated before they are written. The final report records:

analysis mode
pose source
knowledge-card provenance
coach summary provider/model/source
verifier status and bypass flags

Development Checks

uv run ruff check
uv run python -m compileall src scripts tests app.py
uv run python -m unittest discover -s tests

Run the real MediaPipe fixture smoke test only when the fixture is available:

POZIFY_RUN_REAL_POSE_TESTS=1 \
uv run python -m unittest tests.test_pose_steps.PoseStepTests.test_real_sample_mov_extracts_pose_landmarks