{ "cells": [ { "cell_type": "markdown", "id": "c26305a6", "metadata": {}, "source": [ "\n", "# A8 Report\n", "\n", "---\n", "\n", "## Introduction & Objectives\n", "\n", "This notebook documents the current pose estimation system, how to install and run it, the main architectural decisions, and the data formats used.\n", "\n", "### Pose estimator\n", "The uploaded `pose_estimator.py` module already provides:\n", "- MoveNet model loading from TensorFlow Hub\n", "- Image preprocessing\n", "- Single-image pose detection\n", "- Video frame-by-frame pose extraction\n", "- Skeleton overlay rendering\n", "- CLI entry points for image, video, and webcam usage\n" ] }, { "cell_type": "markdown", "id": "0b06d280", "metadata": {}, "source": [ "\n", "## Environment Setup & Installation\n", "\n", "### Installation steps\n", "\n", "#### 1. Create and activate a virtual environment\n", "```bash\n", "python -m venv .venv\n", "```\n", "\n", "**Windows**\n", "```bash\n", ".venv\\Scripts\\activate\n", "```\n", "\n", "**macOS / Linux**\n", "```bash\n", "source .venv/bin/activate\n", "```\n", "\n", "#### 2. Upgrade packaging tools\n", "```bash\n", "python -m pip install --upgrade pip setuptools wheel\n", "```\n", "\n", "#### 3. Install dependencies\n", "```bash\n", "pip install \"tensorflow>=2.13,<3.0\" tensorflow-hub opencv-python numpy pandas matplotlib jupyter\n", "```\n", "\n", "#### 4. Verify installation\n", "```bash\n", "python -c \"import tensorflow as tf; import tensorflow_hub as hub; import cv2; import numpy; import pandas; print(tf.__version__)\"\n", "```" ] }, { "cell_type": "code", "execution_count": 20, "id": "5d3784a2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'tensorflow': True,\n", " 'tensorflow_hub': False,\n", " 'opencv_python': True,\n", " 'numpy': True,\n", " 'pandas': True,\n", " 'matplotlib': True}" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pathlib import Path\n", "import importlib.util\n", "import json\n", "\n", "# Base directory = where the notebook is\n", "BASE_DIR = Path.cwd()\n", "\n", "# All files are in the same directory\n", "POSE_ESTIMATOR_PATH = BASE_DIR / \"pose_estimator.py\"\n", "IMAGE_PATH = BASE_DIR / \"test_person.jpg\"\n", "ANNOTATED_IMAGE_PATH = BASE_DIR / \"test_person_annotated.jpg\"\n", "CSV_PATH = BASE_DIR / \"sample_keypoints.csv\"\n", "JSON_PATH = BASE_DIR / \"sample_keypoints.json\"\n", "\n", "def has_package(name: str) -> bool:\n", " return importlib.util.find_spec(name) is not None\n", "\n", "environment_status = {\n", " \"tensorflow\": has_package(\"tensorflow\"),\n", " \"tensorflow_hub\": has_package(\"tensorflow_hub\"),\n", " \"opencv_python\": has_package(\"cv2\"),\n", " \"numpy\": has_package(\"numpy\"),\n", " \"pandas\": has_package(\"pandas\"),\n", " \"matplotlib\": has_package(\"matplotlib\"),\n", "}\n", "\n", "environment_status" ] }, { "cell_type": "markdown", "id": "74fa9eca", "metadata": {}, "source": [ "\n", "## Pose Estimation Library Overview\n", "\n", "### MoveNet\n", "MoveNet is a lightweight single-person pose estimation model distributed through TensorFlow Hub. It outputs **17 COCO keypoints**, each with:\n", "\n", "- `x`: normalized horizontal coordinate in the range `[0, 1]`\n", "- `y`: normalized vertical coordinate in the range `[0, 1]`\n", "- `confidence`: keypoint confidence score in the range `[0, 1]`\n", "\n", "### Model variants in the current code\n", "The current implementation supports two variants:\n", "\n", "| Variant | Input size | Tradeoff |\n", "|---|---:|---|\n", "| `lightning` | 192 x 192 | Faster inference, slightly lower accuracy |\n", "| `thunder` | 256 x 256 | Slower inference, higher accuracy |\n", "\n", "### COCO keypoints\n", "The code defines the standard 17 COCO keypoints:\n", "\n", "`nose, left_eye, right_eye, left_ear, right_ear, left_shoulder, right_shoulder, left_elbow, right_elbow, left_wrist, right_wrist, left_hip, right_hip, left_knee, right_knee, left_ankle, right_ankle`\n", "\n", "### High-level data flow\n", "```text\n", "Input image/video\n", " -> OpenCV read\n", " -> BGR to RGB conversion\n", " -> Resize with padding to MoveNet input size\n", " -> TensorFlow Hub inference\n", " -> Raw [y, x, confidence] output\n", " -> Parsed keypoint dictionary\n", " -> Optional visualization overlay\n", " -> Optional CSV / JSON export\n", "```\n" ] }, { "cell_type": "markdown", "id": "cc765d7e", "metadata": {}, "source": [ "\n", "## Code Walkthrough & Changes\n", "\n", "### Module structure\n", "\n", "- `MoveNetPoseEstimator`\n", " - model loading\n", " - preprocessing\n", " - pose inference\n", " - image visualization\n", " - video processing\n", " - image file processing\n", "\n", "### Architecture summary\n", "```text\n", "pose_estimator.py\n", "├── constants\n", "│ ├── KEYPOINT_NAMES\n", "│ ├── KEYPOINT_EDGES\n", "│ └── EDGE_COLORS\n", "├── class MoveNetPoseEstimator\n", "│ ├── __init__()\n", "│ ├── preprocess_image()\n", "│ ├── detect_pose()\n", "│ ├── detect_pose_raw()\n", "│ ├── draw_keypoints()\n", "│ ├── process_video()\n", "│ └── process_image_file()\n", "└── main() CLI demo\n", "```\n", "\n", "### Main code changes added relative to a plain TF Hub demo\n", "1. **Reusable class wrapper**\n", " - Encapsulates loading, preprocessing, inference, and rendering into a reusable object.\n", "2. **Named keypoint output**\n", " - Converts raw MoveNet tensor output into a dictionary keyed by body-part name.\n", "3. **Visualization layer**\n", " - Adds skeleton edges and keypoint circles with configurable confidence threshold.\n", "4. **Video processing pipeline**\n", " - Reads video frame-by-frame, runs inference, stores per-frame results, and optionally writes an annotated MP4.\n", "5. **CLI support**\n", " - Adds `--image`, `--video`, `--webcam`, `--output`, and model selection flags for local testing.\n" ] }, { "cell_type": "code", "execution_count": 21, "id": "c5a2e7b1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pose_estimator import available but not fully executable in this environment: dlopen(/Users/reemothman/miniconda3/lib/python3.12/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 0x0006): Library not loaded: @rpath/_pywrap_tensorflow_internal.so\n", " Referenced from: <8B62586B-B082-3113-93AB-FD766A9960AE> /Users/reemothman/miniconda3/lib/python3.12/site-packages/tensorflow-plugins/libmetal_plugin.dylib\n", " Reason: tried: '/Users/reemothman/miniconda3/lib/python3.12/site-packages/tensorflow-plugins/../_solib_darwin_arm64/_U@local_Uconfig_Utf_S_S_C_Upywrap_Utensorflow_Uinternal___Uexternal_Slocal_Uconfig_Utf/_pywrap_tensorflow_internal.so' (no such file), '/Users/reemothman/miniconda3/lib/python3.12/site-packages/tensorflow-plugins/../_solib_darwin_arm64/_U@local_Uconfig_Utf_S_S_C_Upywrap_Utensorflow_Uinternal___Uexternal_Slocal_Uconfig_Utf/_pywrap_tensorflow_internal.so' (no such file), '/Users/reemothman/miniconda3/bin/../lib/_pywrap_tensorflow_internal.so' (no such file)\n", "Compatibility wrapper ready.\n" ] } ], "source": [ "\n", "import sys\n", "from pathlib import Path\n", "import pandas as pd\n", "\n", "if str(POSE_ESTIMATOR_PATH.parent) not in sys.path:\n", " sys.path.insert(0, str(POSE_ESTIMATOR_PATH.parent))\n", "\n", "MoveNetPoseEstimator = None\n", "if POSE_ESTIMATOR_PATH.exists() and has_package(\"cv2\"):\n", " try:\n", " from pose_estimator import MoveNetPoseEstimator\n", " except Exception as exc:\n", " print(f\"pose_estimator import available but not fully executable in this environment: {exc}\")\n", "\n", "class KeypointExtractor:\n", " \"\"\"Compatibility wrapper matching the issue's expected API.\"\"\"\n", "\n", " def __init__(self, model: str = \"movenet\", variant: str = \"lightning\"):\n", " if model.lower() != \"movenet\":\n", " raise ValueError(\"This wrapper currently supports model='movenet' only.\")\n", " if MoveNetPoseEstimator is None:\n", " self.estimator = None\n", " else:\n", " self.estimator = MoveNetPoseEstimator(model_name=variant)\n", " self.model = model\n", " self.variant = variant\n", "\n", " def extract_from_image(self, image_path: str):\n", " if self.estimator is None:\n", " raise RuntimeError(\"MoveNetPoseEstimator is unavailable. Install TensorFlow dependencies first.\")\n", " return self.estimator.process_image_file(image_path)\n", "\n", " def extract_from_video(self, video_path: str, output_path: str | None = None):\n", " if self.estimator is None:\n", " raise RuntimeError(\"MoveNetPoseEstimator is unavailable. Install TensorFlow dependencies first.\")\n", " return self.estimator.process_video(video_path, output_path=output_path)\n", "\n", " @staticmethod\n", " def to_flat_dataframe(results):\n", " rows = []\n", " if isinstance(results, dict):\n", " results = [dict(frame_id=0, timestamp=0.0, **results)]\n", " for item in results:\n", " frame_id = item.get(\"frame_id\", 0)\n", " timestamp = item.get(\"timestamp\", 0.0)\n", " inference_time_ms = item.get(\"inference_time_ms\")\n", " for keypoint_name, kp in item[\"keypoints\"].items():\n", " rows.append({\n", " \"frame_id\": frame_id,\n", " \"timestamp\": timestamp,\n", " \"inference_time_ms\": inference_time_ms,\n", " \"keypoint\": keypoint_name,\n", " \"x\": kp[\"x\"],\n", " \"y\": kp[\"y\"],\n", " \"confidence\": kp[\"confidence\"],\n", " })\n", " return pd.DataFrame(rows)\n", "\n", " def save_to_csv(self, results, output_csv: str):\n", " df = self.to_flat_dataframe(results)\n", " df.to_csv(output_csv, index=False)\n", " return output_csv\n", "\n", " def save_to_json(self, results, output_json: str):\n", " import json\n", " payload = {\"model\": f\"{self.model}/{self.variant}\", \"frames\": results if isinstance(results, list) else [results]}\n", " with open(output_json, \"w\", encoding=\"utf-8\") as f:\n", " json.dump(payload, f, indent=2)\n", " return output_json\n", "\n", "print(\"Compatibility wrapper ready.\")\n" ] }, { "cell_type": "markdown", "id": "a2283dc2", "metadata": {}, "source": [ "\n", "## Usage Examples\n", "\n", "The following cells are designed to be executable in the notebook.\n", "\n", "- In the current container, TensorFlow is not installed, so the actual inference cells are expected to be skipped.\n", "- Once the environment setup above is completed, the same cells can be run end-to-end without modification.\n" ] }, { "cell_type": "code", "execution_count": 22, "id": "bf0a8511", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Image example ready. Install TensorFlow and tensorflow-hub, then run this cell again.\n" ] } ], "source": [ "\n", "# Example 1: Image processing usage\n", "if environment_status[\"tensorflow\"] and environment_status[\"tensorflow_hub\"] and POSE_ESTIMATOR_PATH.exists() and IMAGE_PATH.exists():\n", " extractor = KeypointExtractor(model=\"movenet\", variant=\"lightning\")\n", " image_result = extractor.extract_from_image(str(IMAGE_PATH))\n", " image_result\n", "else:\n", " print(\"Image example ready. Install TensorFlow and tensorflow-hub, then run this cell again.\")\n" ] }, { "cell_type": "code", "execution_count": 23, "id": "461a1b3b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Place a test video at ./video.mp4 after environment setup, then rerun this cell.\n" ] } ], "source": [ "\n", "# Example 2: Video processing usage\n", "video_path = BASE_DIR / \"video.mp4\"\n", "if environment_status[\"tensorflow\"] and environment_status[\"tensorflow_hub\"] and POSE_ESTIMATOR_PATH.exists() and video_path.exists():\n", " extractor = KeypointExtractor(model=\"movenet\", variant=\"lightning\")\n", " video_results = extractor.extract_from_video(str(video_path))\n", " output_csv = REPORT_DIR / \"video_keypoints.csv\"\n", " extractor.save_to_csv(video_results, str(output_csv))\n", " print(f\"Saved: {output_csv}\")\n", "else:\n", " print(\"Place a test video at ./video.mp4 after environment setup, then rerun this cell.\")\n" ] }, { "cell_type": "markdown", "id": "25295128", "metadata": {}, "source": [ "\n", "## Data Format Documentation\n", "\n", "### CSV schema\n", "A flat CSV export is useful for analysis in pandas, Excel, or downstream scripts.\n", "\n", "| Field | Type | Description |\n", "|---|---|---|\n", "| `frame_id` | integer | Frame index in the source video. `0` for single-image mode. |\n", "| `timestamp` | float | Time in seconds from the start of the video. |\n", "| `inference_time_ms` | float | Per-frame inference time in milliseconds. |\n", "| `keypoint` | string | One of the 17 COCO keypoint names. |\n", "| `x` | float | Normalized x coordinate in the range `[0, 1]`. |\n", "| `y` | float | Normalized y coordinate in the range `[0, 1]`. |\n", "| `confidence` | float | Confidence score in the range `[0, 1]`. |\n", "\n", "### JSON schema\n", "A nested JSON export preserves frame structure and is easier to use in applications that consume hierarchical data.\n", "\n", "```json\n", "{\n", " \"model\": \"movenet/lightning\",\n", " \"frames\": [\n", " {\n", " \"frame_id\": 0,\n", " \"timestamp\": 0.0,\n", " \"inference_time_ms\": 28.4,\n", " \"keypoints\": {\n", " \"nose\": {\"x\": 0.498, \"y\": 0.404, \"confidence\": 0.990}\n", " }\n", " }\n", " ]\n", "}\n", "```\n", "\n", "### Coordinate system\n", "- Coordinates are **normalized** relative to image width and height.\n", "- `(0, 0)` is the **top-left** corner.\n", "- `x` increases from left to right.\n", "- `y` increases from top to bottom.\n", "- To convert normalized coordinates to pixel positions:\n", " - `pixel_x = x * image_width`\n", " - `pixel_y = y * image_height`\n", "\n", "### Confidence score interpretation\n", "| Confidence range | Interpretation |\n", "|---:|---|\n", "| `>= 0.80` | Very reliable keypoint |\n", "| `0.50 - 0.79` | Usable but should be checked |\n", "| `0.30 - 0.49` | Weak detection; often filtered in visualization |\n", "| `< 0.30` | Usually unreliable/noisy |\n" ] }, { "cell_type": "code", "execution_count": 24, "id": "4c1549df", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | frame_id | \n", "timestamp | \n", "inference_time_ms | \n", "keypoint | \n", "x | \n", "y | \n", "confidence | \n", "
|---|---|---|---|---|---|---|---|
| 0 | \n", "0 | \n", "0.0 | \n", "28.4 | \n", "nose | \n", "0.498 | \n", "0.404 | \n", "0.990 | \n", "
| 1 | \n", "0 | \n", "0.0 | \n", "28.4 | \n", "left_eye | \n", "0.456 | \n", "0.361 | \n", "0.975 | \n", "
| 2 | \n", "0 | \n", "0.0 | \n", "28.4 | \n", "right_eye | \n", "0.562 | \n", "0.362 | \n", "0.973 | \n", "
| 3 | \n", "0 | \n", "0.0 | \n", "28.4 | \n", "left_ear | \n", "0.392 | \n", "0.394 | \n", "0.911 | \n", "
| 4 | \n", "0 | \n", "0.0 | \n", "28.4 | \n", "right_ear | \n", "0.640 | \n", "0.403 | \n", "0.918 | \n", "
| 5 | \n", "0 | \n", "0.0 | \n", "28.4 | \n", "left_shoulder | \n", "0.252 | \n", "0.732 | \n", "0.942 | \n", "
| 6 | \n", "0 | \n", "0.0 | \n", "28.4 | \n", "right_shoulder | \n", "0.774 | \n", "0.731 | \n", "0.946 | \n", "
| 7 | \n", "0 | \n", "0.0 | \n", "28.4 | \n", "left_elbow | \n", "0.305 | \n", "0.690 | \n", "0.440 | \n", "
| 8 | \n", "0 | \n", "0.0 | \n", "28.4 | \n", "right_elbow | \n", "0.708 | \n", "0.690 | \n", "0.452 | \n", "
| 9 | \n", "0 | \n", "0.0 | \n", "28.4 | \n", "left_wrist | \n", "0.280 | \n", "0.770 | \n", "0.181 | \n", "