update

Browse files

Files changed (15) hide show

.gitattributes +1 -1
.gitignore +5 -0
keypoint → 20251029-detection.pt +2 -2
football_keypoints_detection.pt → 20251029-keypoint.pt +0 -0
README.md +132 -0
SV_kp.engine +3 -0
config.yml +2 -3
football_object_detection.onnx → detection.onnx +0 -0
detection.pt +3 -0
hrnetv2_w48.yaml +0 -35
keypoint.pt +3 -0
miner.py +439 -359
object-detection.onnx +3 -0
pitch.py +25 -15
player.py +388 -0

.gitattributes CHANGED Viewed

@@ -33,5 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
-keypoint filter=lfs diff=lfs merge=lfs -text
 osnet_model.pth.tar-100 filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+SV_kp.engine filter=lfs diff=lfs merge=lfs -text
 osnet_model.pth.tar-100 filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,5 @@

+venv
+outputs
+test_predict_batch.py
+test.mp4
+inspect_yolo_model.py

keypoint → 20251029-detection.pt RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7ea78fa76aaf94976a8eca428d6e3c59697a93430cba1a4603e20284b61f5113
-size 264964645

 version https://git-lfs.github.com/spec/v1
+oid sha256:8bbacfcb38e38b1b8816788e9e6e845160533719a0b87b693d58b932380d0d28
+size 152961687

football_keypoints_detection.pt → 20251029-keypoint.pt RENAMED Viewed

File without changes

README.md ADDED Viewed

	@@ -0,0 +1,132 @@

+🚀 Example Chute for Turbovision 🪂
+This repository demonstrates how to deploy a Chute via the Turbovision CLI, hosted on Hugging Face Hub. It serves as a minimal example showcasing the required structure and workflow for integrating machine learning models, preprocessing, and orchestration into a reproducible Chute environment.
+## Repository Structure
+The following two files must be present (in their current locations) for a successful deployment — their content can be modified as needed:
+| File | Purpose |
+|------|---------|
+| `miner.py` | Defines the ML model type(s), orchestration, and all pre/postprocessing logic. |
+| `config.yml` | Specifies machine configuration (e.g., GPU type, memory, environment variables). |
+Other files — e.g., model weights, utility scripts, or dependencies — are optional and can be included as needed for your model.
+> **Note**: Any required assets must be defined or contained within this repo, which is fully open-source, since all network-related operations (downloading challenge data, weights, etc.) are disabled inside the Chute.
+## Overview
+Below is a high-level diagram showing the interaction between Huggingface, Chutes and Turbovision:
+```
+┌─────────────┐      ┌──────────┐      ┌──────────────┐
+│ HuggingFace │ ───> │  Chutes   │ ───> │ Turbovision │
+│     Hub     │      │    .ai    │      │   Validator │
+└─────────────┘      └──────────┘      └──────────────┘
+```
+## Local Testing
+After editing the `config.yml` and `miner.py` and saving it into your Huggingface Repo, you will want to test it works locally.
+1. **Copy the template file** `scorevision/chute_template/turbovision_chute.py.j2` as a python file called `my_chute.py` and fill in the missing variables:
+```python
+HF_REPO_NAME = "{{ huggingface_repository_name }}"
+HF_REPO_REVISION = "{{ huggingface_repository_revision }}"
+CHUTES_USERNAME = "{{ chute_username }}"
+CHUTE_NAME = "{{ chute_name }}"
+```
+2. **Run the following command to build the chute locally** (Caution: there are known issues with the docker location when running this on a mac):
+```bash
+chutes build my_chute:chute --local --public
+```
+3. **Run the name of the docker image just built** (i.e. `CHUTE_NAME`) and enter it:
+```bash
+docker run -p 8000:8000 -e CHUTES_EXECUTION_CONTEXT=REMOTE -it <image-name> /bin/bash
+```
+4. **Run the file from within the container**:
+```bash
+chutes run my_chute:chute --dev --debug
+```
+5. **In another terminal, test the local endpoints** to ensure there are no bugs:
+```bash
+# Health check
+curl -X POST http://localhost:8000/health -d '{}'
+# Prediction test
+curl -X POST http://localhost:8000/predict -d '{"url": "https://scoredata.me/2025_03_14/35ae7a/h1_0f2ca0.mp4","meta": {}}'
+```
+## Live Testing
+If you have any chute with the same name (i.e. from a previous deployment), ensure you delete that first (or you will get an error when trying to build).
+1. **List existing chutes**:
+```bash
+chutes chutes list
+```
+Take note of the chute id that you wish to delete (if any):
+```bash
+chutes chutes delete <chute-id>
+```
+2. **You should also delete its associated image**:
+```bash
+chutes images list
+```
+Take note of the chute image id:
+```bash
+chutes images delete <chute-image-id>
+```
+3. **Use Turbovision's CLI to build, deploy and commit on-chain**:
+```bash
+sv -vv push
+```
+> **Note**: You can skip the on-chain commit using `--no-commit`. You can also specify a past huggingface revision to point to using `--revision` and/or the local files you want to upload to your huggingface repo using `--model-path`.
+4. **When completed, warm up the chute** (if its cold 🧊):
+You can confirm its status using `chutes chutes list` or `chutes chutes get <chute-id>` if you already know its id.
+> **Note**: Warming up can sometimes take a while but if the chute runs without errors (should be if you've tested locally first) and there are sufficient nodes (i.e. machines) available matching the `config.yml` you specified, the chute should become hot 🔥!
+```bash
+chutes warmup <chute-id>
+```
+5. **Test the chute's endpoints**:
+```bash
+# Health check
+curl -X POST https://<YOUR-CHUTE-SLUG>.chutes.ai/health -d '{}' -H "Authorization: Bearer $CHUTES_API_KEY"
+# Prediction
+curl -X POST https://<YOUR-CHUTE-SLUG>.chutes.ai/predict -d '{"url": "https://scoredata.me/2025_03_14/35ae7a/h1_0f2ca0.mp4","meta": {}}' -H "Authorization: Bearer $CHUTES_API_KEY"
+```
+6. **Test what your chute would get on a validator**:
+This also applies any validation/integrity checks which may fail if you did not use the Turbovision CLI above to deploy the chute:
+```bash
+sv -vv run-once
+```

SV_kp.engine ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f99452eb79e064189e2758abd20a78845a5b639fc8b9c4bc650519c83e13e8db
+size 368289641

config.yml CHANGED Viewed

@@ -2,15 +2,14 @@ Image:
   from_base: parachutes/python:3.12
   run_command:
     - pip install --upgrade setuptools wheel
-    - pip install "torch==2.7.1" "torchvision==0.22.1"
-    - pip install "ultralytics==8.3.222" "opencv-python-headless" "numpy" "pydantic"
     - pip install scikit-learn
     - pip install onnxruntime-gpu
   set_workdir: /app
 NodeSelector:
   gpu_count: 1
-  min_vram_gb_per_gpu: 24
   exclude:
     - "5090"
     - b200

   from_base: parachutes/python:3.12
   run_command:
     - pip install --upgrade setuptools wheel
+    - pip install ultralytics==8.3.222 opencv-python-headless numpy pydantic
     - pip install scikit-learn
     - pip install onnxruntime-gpu
   set_workdir: /app
 NodeSelector:
   gpu_count: 1
+  min_vram_gb_per_gpu: 16
   exclude:
     - "5090"
     - b200

football_object_detection.onnx → detection.onnx RENAMED Viewed

File without changes

detection.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2ad3e89b658d2626c34174f6799d240ffd37cfe45752c0ce6ef73b05935042e0
+size 52014742

hrnetv2_w48.yaml DELETED Viewed

@@ -1,35 +0,0 @@
-MODEL:
-  IMAGE_SIZE: [960, 540]
-  NUM_JOINTS: 58
-  PRETRAIN: ''
-  EXTRA:
-    FINAL_CONV_KERNEL: 1
-    STAGE1:
-      NUM_MODULES: 1
-      NUM_BRANCHES: 1
-      BLOCK: BOTTLENECK
-      NUM_BLOCKS: [4]
-      NUM_CHANNELS: [64]
-      FUSE_METHOD: SUM
-    STAGE2:
-      NUM_MODULES: 1
-      NUM_BRANCHES: 2
-      BLOCK: BASIC
-      NUM_BLOCKS: [4, 4]
-      NUM_CHANNELS: [48, 96]
-      FUSE_METHOD: SUM
-    STAGE3:
-      NUM_MODULES: 4
-      NUM_BRANCHES: 3
-      BLOCK: BASIC
-      NUM_BLOCKS: [4, 4, 4]
-      NUM_CHANNELS: [48, 96, 192]
-      FUSE_METHOD: SUM
-    STAGE4:
-      NUM_MODULES: 3
-      NUM_BRANCHES: 4
-      BLOCK: BASIC
-      NUM_BLOCKS: [4, 4, 4, 4]
-      NUM_CHANNELS: [48, 96, 192, 384]
-      FUSE_METHOD: SUM

keypoint.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6dd10dba85895c92760cdb5a99c5cfca899c68f361a66c5448f38a187280ee1f
+size 6849672

miner.py CHANGED Viewed

@@ -1,34 +1,23 @@
 from pathlib import Path
-from typing import List, Tuple, Dict
-import sys
-import os
-from numpy import ndarray
-from pydantic import BaseModel
 sys.path.append(os.path.dirname(os.path.abspath(__file__)))
 from ultralytics import YOLO
 from team_cluster import TeamClassifier
 from utils import (
     BoundingBox,
     Constants,
 )
-import time
-import torch
-import gc
-from pitch import process_batch_input, get_cls_net
-import yaml
-class BoundingBox(BaseModel):
-    x1: int
-    y1: int
-    x2: int
-    y2: int
-    cls_id: int
-    conf: float
 class TVFrameResult(BaseModel):
     frame_id: int
@@ -37,6 +26,10 @@ class TVFrameResult(BaseModel):
 class Miner:
     SMALL_CONTAINED_IOA = Constants.SMALL_CONTAINED_IOA
     SMALL_RATIO_MAX = Constants.SMALL_RATIO_MAX
     SINGLE_PLAYER_HUE_PIVOT = Constants.SINGLE_PLAYER_HUE_PIVOT
@@ -45,385 +38,472 @@ class Miner:
     CORNER_CONFIDENCE = Constants.CORNER_CONFIDENCE
     GOALKEEPER_POSITION_MARGIN = Constants.GOALKEEPER_POSITION_MARGIN
     MIN_SAMPLES_FOR_FIT = 16  # Minimum player crops needed before fitting TeamClassifier
-    MAX_SAMPLES_FOR_FIT = 600  # Maximum samples to avoid overfitting
     def __init__(self, path_hf_repo: Path) -> None:
-        try:
-            device = "cuda" if torch.cuda.is_available() else "cpu"
-            model_path = path_hf_repo / "football_object_detection.onnx"
-            self.bbox_model = YOLO(model_path)
-            print("BBox Model Loaded")
-            team_model_path = path_hf_repo / "osnet_model.pth.tar-100"
-            self.team_classifier = TeamClassifier(
-                device=device,
-                batch_size=32,
-                model_name=str(team_model_path)
-            )
-            print("Team Classifier Loaded")
-            # Team classification state
-            self.team_classifier_fitted = False
-            self.player_crops_for_fit = []
-            model_kp_path = path_hf_repo / 'keypoint'
-            config_kp_path = path_hf_repo / 'hrnetv2_w48.yaml'
-            cfg_kp = yaml.safe_load(open(config_kp_path, 'r'))
-            loaded_state_kp = torch.load(model_kp_path, map_location=device)
-            model = get_cls_net(cfg_kp)
-            model.load_state_dict(loaded_state_kp)
-            model.to(device)
-            model.eval()
-            self.keypoints_model = model
-            self.kp_threshold = 0.1
-            self.pitch_batch_size = 4
-            self.health = "healthy"
-            print("✅ Keypoints Model Loaded")
-        except Exception as e:
-            self.health = "❌ Miner initialization failed: " + str(e)
-            print(self.health)
-    def __repr__(self) -> str:
-        if self.health == 'healthy':
-            return (
-                f"health: {self.health}\n"
-                f"BBox Model: {type(self.bbox_model).__name__}\n"
-                f"Keypoints Model: {type(self.keypoints_model).__name__}"
-            )
-        else:
-            return self.health
-    def _calculate_iou(self, box1: Tuple[float, float, float, float],
-                       box2: Tuple[float, float, float, float]) -> float:
         """
-        Calculate Intersection over Union (IoU) between two bounding boxes.
-        Args:
-            box1: (x1, y1, x2, y2)
-            box2: (x1, y1, x2, y2)
         Returns:
-            IoU score (0-1)
         """
-        x1_1, y1_1, x2_1, y2_1 = box1
-        x1_2, y1_2, x2_2, y2_2 = box2
-        # Calculate intersection area
-        x_left = max(x1_1, x1_2)
-        y_top = max(y1_1, y1_2)
-        x_right = min(x2_1, x2_2)
-        y_bottom = min(y2_1, y2_2)
-        if x_right < x_left or y_bottom < y_top:
-            return 0.0
-        intersection_area = (x_right - x_left) * (y_bottom - y_top)
-        # Calculate union area
-        box1_area = (x2_1 - x1_1) * (y2_1 - y1_1)
-        box2_area = (x2_2 - x1_2) * (y2_2 - y1_2)
-        union_area = box1_area + box2_area - intersection_area
-        if union_area == 0:
-            return 0.0
-        return intersection_area / union_area
-    def _detect_objects_batch(self, decoded_images: List[ndarray]) -> Dict[int, List[BoundingBox]]:
-        batch_size = 16
-        detection_results = []
-        n_frames = len(decoded_images)
-        for frame_number in range(0, n_frames, batch_size):
-            batch_images = decoded_images[frame_number: frame_number + batch_size]
-            detections = self.bbox_model(batch_images, verbose=False, save=False)
-            detection_results.extend(detections)
-        return detection_results
-    def _team_classify(self, detection_results, decoded_images, offset):
-        self.team_classifier_fitted = False
-        start = time.time()
-        # Collect player crops from first batch for fitting
-        fit_sample_size = 600
-        player_crops_for_fit = []
-        for frame_id in range(len(detection_results)):
-            detection_box = detection_results[frame_id].boxes.data
-            if len(detection_box) < 4:
                 continue
-            # Collect player boxes for team classification fitting (first batch only)
-            if len(player_crops_for_fit) < fit_sample_size:
-                frame_image = decoded_images[frame_id]
-                for box in detection_box:
-                    x1, y1, x2, y2, conf, cls_id = box.tolist()
-                    if conf < 0.5:
-                        continue
-                    mapped_cls_id = str(int(cls_id))
-                    # Only collect player crops (cls_id = 2)
-                    if mapped_cls_id == '2':
-                        crop = frame_image[int(y1):int(y2), int(x1):int(x2)]
-                        if crop.size > 0:
-                            player_crops_for_fit.append(crop)
-            # Fit team classifier after collecting samples
-            if self.team_classifier and not self.team_classifier_fitted and len(player_crops_for_fit) >= fit_sample_size:
-                print(f"Fitting TeamClassifier with {len(player_crops_for_fit)} player crops")
-                self.team_classifier.fit(player_crops_for_fit)
-                self.team_classifier_fitted = True
-                break
-        if not self.team_classifier_fitted and len(player_crops_for_fit) >= 16:
-            print(f"Fallback: Fitting TeamClassifier with {len(player_crops_for_fit)} player crops")
-            self.team_classifier.fit(player_crops_for_fit)
-            self.team_classifier_fitted = True
-        end = time.time()
-        print(f"Fitting Kmeans time: {end - start}")
-        # Second pass: predict teams with configurable frame skipping optimization
-        start = time.time()
-        # Get configuration for frame skipping
-        prediction_interval = 1  # Default: predict every 2 frames
-        iou_threshold = 0.3
-        print(f"Team classification - prediction_interval: {prediction_interval}, iou_threshold: {iou_threshold}")
-        # Storage for predicted frame results: {frame_id: {box_idx: (bbox, team_id)}}
-        predicted_frame_data = {}
-        # Step 1: Predict for frames at prediction_interval only
-        frames_to_predict = []
-        for frame_id in range(len(detection_results)):
-            if frame_id % prediction_interval == 0:
-                frames_to_predict.append(frame_id)
-        print(f"Predicting teams for {len(frames_to_predict)}/{len(detection_results)} frames "
-                    f"(saving {100 - (len(frames_to_predict) * 100 // len(detection_results))}% compute)")
-        for frame_id in frames_to_predict:
-            detection_box = detection_results[frame_id].boxes.data
-            frame_image = decoded_images[frame_id]
-            # Collect player crops for this frame
-            frame_player_crops = []
-            frame_player_indices = []
-            frame_player_boxes = []
-            for idx, box in enumerate(detection_box):
-                x1, y1, x2, y2, conf, cls_id = box.tolist()
-                if cls_id == 2 and conf < 0.6:
                     continue
-                mapped_cls_id = str(int(cls_id))
-                # Collect player crops for prediction
-                if self.team_classifier and self.team_classifier_fitted and mapped_cls_id == '2':
-                    crop = frame_image[int(y1):int(y2), int(x1):int(x2)]
-                    if crop.size > 0:
-                        frame_player_crops.append(crop)
-                        frame_player_indices.append(idx)
-                        frame_player_boxes.append((x1, y1, x2, y2))
-            # Predict teams for all players in this frame
-            if len(frame_player_crops) > 0:
-                team_ids = self.team_classifier.predict(frame_player_crops)
-                predicted_frame_data[frame_id] = {}
-                for idx, bbox, team_id in zip(frame_player_indices, frame_player_boxes, team_ids):
-                    # Map team_id (0,1) to cls_id (6,7)
-                    team_cls_id = str(6 + int(team_id))
-                    predicted_frame_data[frame_id][idx] = (bbox, team_cls_id)
-        # Step 2: Process all frames (interpolate skipped frames)
-        fallback_count = 0
-        interpolated_count = 0
-        bboxes: dict[int, list[BoundingBox]] = {}
-        for frame_id in range(len(detection_results)):
-            detection_box = detection_results[frame_id].boxes.data
-            frame_image = decoded_images[frame_id]
-            boxes = []
-            team_predictions = {}
-            if frame_id % prediction_interval == 0:
-                # Predicted frame: use pre-computed predictions
-                if frame_id in predicted_frame_data:
-                    for idx, (bbox, team_cls_id) in predicted_frame_data[frame_id].items():
-                        team_predictions[idx] = team_cls_id
-            else:
-                # Skipped frame: interpolate from neighboring predicted frames
-                # Find nearest predicted frames
-                prev_predicted_frame = (frame_id // prediction_interval) * prediction_interval
-                next_predicted_frame = prev_predicted_frame + prediction_interval
-                # Collect current frame player boxes
-                for idx, box in enumerate(detection_box):
-                    x1, y1, x2, y2, conf, cls_id = box.tolist()
-                    if cls_id == 2 and conf < 0.6:
-                        continue
-                    mapped_cls_id = str(int(cls_id))
-                    if self.team_classifier and self.team_classifier_fitted and mapped_cls_id == '2':
-                        target_box = (x1, y1, x2, y2)
-                        # Try to match with previous predicted frame
-                        best_team_id = None
-                        best_iou = 0.0
-                        if prev_predicted_frame in predicted_frame_data:
-                            team_id, iou = self._find_best_match(
-                                target_box,
-                                predicted_frame_data[prev_predicted_frame],
-                                iou_threshold
-                            )
-                            if team_id is not None:
-                                best_team_id = team_id
-                                best_iou = iou
-                        # Try to match with next predicted frame if available and no good match yet
-                        if best_team_id is None and next_predicted_frame < len(detection_results):
-                            if next_predicted_frame in predicted_frame_data:
-                                team_id, iou = self._find_best_match(
-                                    target_box,
-                                    predicted_frame_data[next_predicted_frame],
-                                    iou_threshold
-                                )
-                                if team_id is not None and iou > best_iou:
-                                    best_team_id = team_id
-                                    best_iou = iou
-                        # Track interpolation success
-                        if best_team_id is not None:
-                            interpolated_count += 1
-                        else:
-                            # Fallback: if no match found, predict individually
-                            crop = frame_image[int(y1):int(y2), int(x1):int(x2)]
-                            if crop.size > 0:
-                                team_id = self.team_classifier.predict([crop])[0]
-                                best_team_id = str(6 + int(team_id))
-                                fallback_count += 1
-                        if best_team_id is not None:
-                            team_predictions[idx] = best_team_id
-            # Parse boxes with team classification
-            for idx, box in enumerate(detection_box):
-                x1, y1, x2, y2, conf, cls_id = box.tolist()
-                if cls_id == 2 and conf < 0.6:
                     continue
-                # Check overlap with staff box
-                overlap_staff = False
-                for idy, boxy in enumerate(detection_box):
-                    s_x1, s_y1, s_x2, s_y2, s_conf, s_cls_id = boxy.tolist()
-                    if cls_id == 2 and s_cls_id == 4:
-                        staff_iou = self._calculate_iou(box[:4], boxy[:4])
-                        if staff_iou >= 0.8:
-                            overlap_staff = True
                             break
-                if overlap_staff:
-                    continue
-                mapped_cls_id = str(int(cls_id))
-                # Override cls_id for players with team prediction
-                if idx in team_predictions:
-                    mapped_cls_id = team_predictions[idx]
-                if mapped_cls_id != '4':
-                    if int(mapped_cls_id) == 3 and conf < 0.5:
-                        continue
                     boxes.append(
                         BoundingBox(
                             x1=int(x1),
                             y1=int(y1),
                             x2=int(x2),
                             y2=int(y2),
-                            cls_id=int(mapped_cls_id),
                             conf=float(conf),
                         )
                     )
             # Handle footballs - keep only the best one
             footballs = [bb for bb in boxes if int(bb.cls_id) == 0]
             if len(footballs) > 1:
                 best_ball = max(footballs, key=lambda b: b.conf)
                 boxes = [bb for bb in boxes if int(bb.cls_id) != 0]
                 boxes.append(best_ball)
-            bboxes[offset + frame_id] = boxes
-        return bboxes
-    def predict_batch(self, batch_images: List[ndarray], offset: int, n_keypoints: int) -> List[TVFrameResult]:
-        start = time.time()
-        detection_results = self._detect_objects_batch(batch_images)
-        end = time.time()
-        print(f"Detection time: {end - start}")
-        start = time.time()
-        bboxes = self._team_classify(detection_results, batch_images, offset)
-        end = time.time()
-        print(f"Team classify time: {end - start}")
-        pitch_batch_size = min(self.pitch_batch_size, len(batch_images))
-        keypoints: Dict[int, List[Tuple[int, int]]] = {}
-        start = time.time()
-        while True:
-            gc.collect()
-            if torch.cuda.is_available():
-                torch.cuda.empty_cache()
-                torch.cuda.synchronize()
-            device_str = "cuda"
-            keypoints_result = process_batch_input(
-                batch_images,
-                self.keypoints_model,
-                self.kp_threshold,
-                device_str,
-                batch_size=pitch_batch_size,
             )
-            if keypoints_result is not None and len(keypoints_result) > 0:
-                for frame_number_in_batch, kp_dict in enumerate(keypoints_result):
-                    if frame_number_in_batch >= len(batch_images):
-                        break
-                    frame_keypoints: List[Tuple[int, int]] = []
-                    try:
-                        height, width = batch_images[frame_number_in_batch].shape[:2]
-                        if kp_dict is not None and isinstance(kp_dict, dict):
-                            for idx in range(32):
-                                x, y = 0, 0
-                                kp_idx = idx + 1
-                                if kp_idx in kp_dict:
-                                    try:
-                                        kp_data = kp_dict[kp_idx]
-                                        if isinstance(kp_data, dict) and "x" in kp_data and "y" in kp_data:
-                                            x = int(kp_data["x"] * width)
-                                            y = int(kp_data["y"] * height)
-                                    except (KeyError, TypeError, ValueError):
-                                        pass
-                                frame_keypoints.append((x, y))
-                    except (IndexError, ValueError, AttributeError):
-                        frame_keypoints = [(0, 0)] * 32
-                    if len(frame_keypoints) < n_keypoints:
-                        frame_keypoints.extend([(0, 0)] * (n_keypoints - len(frame_keypoints)))
-                    else:
-                        frame_keypoints = frame_keypoints[:n_keypoints]
-                    keypoints[offset + frame_number_in_batch] = frame_keypoints
-            break
-        end = time.time()
-        print(f"Keypoint time: {end - start}")
         results: List[TVFrameResult] = []
         for frame_number in range(offset, offset + len(batch_images)):
-            frame_boxes = bboxes.get(frame_number, [])
-            frame_keypoints = keypoints.get(frame_number, [(0, 0) for _ in range(n_keypoints)])
-            result = TVFrameResult(
-                frame_id=frame_number,
-                boxes=frame_boxes,
-                keypoints=frame_keypoints,
             )
-            results.append(result)
-        gc.collect()
-        if torch.cuda.is_available():
-            torch.cuda.empty_cache()
-            torch.cuda.synchronize()
-        return results

 from pathlib import Path
+from typing import List, Tuple, Dict, Optional
+import sys, os
 sys.path.append(os.path.dirname(os.path.abspath(__file__)))
+import onnxruntime as ort
+import numpy as np
+import cv2
+from torchvision.ops import batched_nms
+import torch
 from ultralytics import YOLO
+from numpy import ndarray
+from pydantic import BaseModel
 from team_cluster import TeamClassifier
 from utils import (
     BoundingBox,
     Constants,
+    suppress_small_contained_boxes,
+    classify_teams_batch,
 )
 class TVFrameResult(BaseModel):
     frame_id: int
 class Miner:
+    """
+    Football video analysis system for object detection and team classification.
+    """
+    # Use constants from utils
     SMALL_CONTAINED_IOA = Constants.SMALL_CONTAINED_IOA
     SMALL_RATIO_MAX = Constants.SMALL_RATIO_MAX
     SINGLE_PLAYER_HUE_PIVOT = Constants.SINGLE_PLAYER_HUE_PIVOT
     CORNER_CONFIDENCE = Constants.CORNER_CONFIDENCE
     GOALKEEPER_POSITION_MARGIN = Constants.GOALKEEPER_POSITION_MARGIN
     MIN_SAMPLES_FOR_FIT = 16  # Minimum player crops needed before fitting TeamClassifier
+    MAX_SAMPLES_FOR_FIT = 500  # Maximum samples to avoid overfitting
     def __init__(self, path_hf_repo: Path) -> None:
+        providers = [
+            'CUDAExecutionProvider',
+            'CPUExecutionProvider'
+        ]
+        model_path = path_hf_repo / "detection.onnx"
+        session = ort.InferenceSession(model_path, providers=providers)
+        input_name = session.get_inputs()[0].name
+        height = width = 640
+        dummy = np.zeros((1, 3, height, width), dtype=np.float32)
+        session.run(None, {input_name: dummy})
+        model = session
+        self.bbox_model = model
+        print("BBox Model Loaded")
+        self.keypoints_model = YOLO(path_hf_repo / "keypoint.pt")
+        print("Keypoints Model (keypoint.pt) Loaded")
+        # Initialize team classifier with OSNet model
+        team_model_path = path_hf_repo / "osnet_model.pth.tar-100"
+        device = 'cuda'
+        self.team_classifier = TeamClassifier(
+            device=device,
+            batch_size=32,
+            model_name=str(team_model_path)
+        )
+        print("Team Classifier Loaded")
+        # Team classification state
+        self.team_classifier_fitted = False
+        self.player_crops_for_fit = []  # Collect samples across frames
+    def __repr__(self) -> str:
+        return (
+            f"BBox Model: {type(self.bbox_model).__name__}\n"
+            f"Keypoints Model: {type(self.keypoints_model).__name__}"
+        )
+    def _handle_multiple_goalkeepers(self, boxes: List[BoundingBox]) -> List[BoundingBox]:
         """
+        Handle goalkeeper detection issues:
+        1. Fix misplaced goalkeepers (standing in middle of field)
+        2. Limit to maximum 2 goalkeepers (one from each team)
         Returns:
+            Filtered list of boxes with corrected goalkeepers
         """
+        # Step 1: Fix misplaced goalkeepers first
+        # Convert goalkeepers in middle of field to regular players
+        boxes = self._fix_misplaced_goalkeepers(boxes)
+        # Step 2: Handle multiple goalkeepers (after fixing misplaced ones)
+        gk_idxs = [i for i, bb in enumerate(boxes) if int(bb.cls_id) == 1]
+        if len(gk_idxs) <= 2:
+            return boxes
+        # Sort goalkeepers by confidence (highest first)
+        gk_idxs_sorted = sorted(gk_idxs, key=lambda i: boxes[i].conf, reverse=True)
+        keep_gk_idxs = set(gk_idxs_sorted[:2])  # Keep top 2 goalkeepers
+        # Create new list keeping only top 2 goalkeepers
+        filtered_boxes = []
+        for i, box in enumerate(boxes):
+            if int(box.cls_id) == 1:
+                # Only keep the top 2 goalkeepers by confidence
+                if i in keep_gk_idxs:
+                    filtered_boxes.append(box)
+                # Skip extra goalkeepers
+            else:
+                # Keep all non-goalkeeper boxes
+                filtered_boxes.append(box)
+        return filtered_boxes
+    def _fix_misplaced_goalkeepers(self, boxes: List[BoundingBox]) -> List[BoundingBox]:
+        """
+        """
+        gk_idxs = [i for i, bb in enumerate(boxes) if int(bb.cls_id) == 1]
+        player_idxs = [i for i, bb in enumerate(boxes) if int(bb.cls_id) == 2]
+        if len(gk_idxs) == 0 or len(player_idxs) < 2:
+            return boxes
+        updated_boxes = boxes.copy()
+        for gk_idx in gk_idxs:
+            if boxes[gk_idx].conf < 0.3:
+                updated_boxes[gk_idx].cls_id = 2
+        return updated_boxes
+    def _pre_process_img(self, frames: List[np.ndarray], scale: float = 640.0) -> np.ndarray:
+        """
+        Preprocess images for ONNX inference.
+        Args:
+            frames: List of BGR frames
+            scale: Target scale for resizing
+        Returns:
+            Preprocessed numpy array ready for ONNX inference
+        """
+        imgs = np.stack([cv2.resize(frame, (int(scale), int(scale))) for frame in frames])
+        imgs = imgs.transpose(0, 3, 1, 2)  # BHWC to BCHW
+        imgs = imgs.astype(np.float32) / 255.0  # Normalize to [0, 1]
+        return imgs
+    def _post_process_output(self, outputs: np.ndarray, x_scale: float, y_scale: float,
+                            conf_thresh: float = 0.6, nms_thresh: float = 0.55) -> List[List[Tuple]]:
+        """
+        Post-process ONNX model outputs to get detections.
+        Args:
+            outputs: Raw ONNX model outputs
+            x_scale: X-axis scaling factor
+            y_scale: Y-axis scaling factor
+            conf_thresh: Confidence threshold
+            nms_thresh: NMS threshold
+        Returns:
+            List of detections for each frame: [(box, conf, class_id), ...]
+        """
+        B, C, N = outputs.shape
+        outputs = torch.from_numpy(outputs)
+        outputs = outputs.permute(0, 2, 1)  # B,C,N -> B,N,C
+        boxes = outputs[..., :4]
+        class_scores = 1 / (1 + torch.exp(-outputs[..., 4:]))  # Sigmoid activation
+        conf, class_id = class_scores.max(dim=2)
+        mask = conf > conf_thresh
+        # Special handling for balls - keep best one even with lower confidence
+        for i in range(class_id.shape[0]):  # loop over batch
+            # Find detections that are balls
+            ball_mask = class_id[i] == 0
+            ball_idx = ball_mask.nonzero(as_tuple=True)[0]
+            if ball_idx.numel() > 0:
+                # Pick the one with the highest confidence
+                best_ball_idx = ball_idx[conf[i, ball_idx].argmax()]
+                if conf[i, best_ball_idx] >= 0.55:  # apply confidence threshold
+                    mask[i, best_ball_idx] = True
+        batch_idx, pred_idx = mask.nonzero(as_tuple=True)
+        if len(batch_idx) == 0:
+            return [[] for _ in range(B)]
+        boxes = boxes[batch_idx, pred_idx]
+        conf = conf[batch_idx, pred_idx]
+        class_id = class_id[batch_idx, pred_idx]
+        # Convert from center format to xyxy format
+        x, y, w, h = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
+        x1 = (x - w / 2) * x_scale
+        y1 = (y - h / 2) * y_scale
+        x2 = (x + w / 2) * x_scale
+        y2 = (y + h / 2) * y_scale
+        boxes_xyxy = torch.stack([x1, y1, x2, y2], dim=1)
+        # Apply batched NMS
+        max_coord = 1e4
+        offset = batch_idx.to(boxes_xyxy) * max_coord
+        boxes_for_nms = boxes_xyxy + offset[:, None]
+        keep = batched_nms(boxes_for_nms, conf, batch_idx, nms_thresh)
+        boxes_final = boxes_xyxy[keep]
+        conf_final = conf[keep]
+        class_final = class_id[keep]
+        batch_final = batch_idx[keep]
+        # Group results by batch
+        results = [[] for _ in range(B)]
+        for b in range(B):
+            mask_b = batch_final == b
+            if mask_b.sum() == 0:
                 continue
+            results[b] = list(zip(boxes_final[mask_b].numpy(),
+                                  conf_final[mask_b].numpy(),
+                                  class_final[mask_b].numpy()))
+        return results
+    def _ioa(self, a: BoundingBox, b: BoundingBox) -> float:
+        inter = self._intersect_area(a, b)
+        aa = self._area(a)
+        if aa <= 0:
+            return 0.0
+        return inter / aa
+    def suppress_small_contained(self, boxes: List[BoundingBox]) -> List[BoundingBox]:
+        if len(boxes) <= 1:
+            return boxes
+        keep = [True] * len(boxes)
+        areas = [self._area(bb) for bb in boxes]
+        for i in range(len(boxes)):
+            if not keep[i]:
+                continue
+            for j in range(len(boxes)):
+                if i == j or not keep[j]:
                     continue
+                ai, aj = areas[i], areas[j]
+                if ai == 0 or aj == 0:
                     continue
+                if ai <= aj:
+                    ratio = ai / aj
+                    if ratio <= self.SMALL_RATIO_MAX:
+                        ioa_i_in_j = self._ioa(boxes[i], boxes[j])
+                        if ioa_i_in_j >= self.SMALL_CONTAINED_IOA:
+                            keep[i] = False
                             break
+                else:
+                    ratio = aj / ai
+                    if ratio <= self.SMALL_RATIO_MAX:
+                        ioa_j_in_i = self._ioa(boxes[j], boxes[i])
+                        if ioa_j_in_i >= self.SMALL_CONTAINED_IOA:
+                            keep[j] = False
+        return [bb for bb, k in zip(boxes, keep) if k]
+    def _detect_objects_batch(self, batch_images: List[ndarray], offset: int) -> Dict[int, List[BoundingBox]]:
+        """
+        Phase 1: Object detection for all frames in batch.
+        Returns detected objects with players still having class_id=2 (before team classification).
+        Args:
+            batch_images: List of images to process
+            offset: Frame offset for numbering
+        Returns:
+            Dictionary mapping frame_id to list of detected boxes
+        """
+        bboxes: Dict[int, List[BoundingBox]] = {}
+        if len(batch_images) == 0:
+            return bboxes
+        print(f"Processing batch of {len(batch_images)} images")
+        # Get original image dimensions for scaling
+        height, width = batch_images[0].shape[:2]
+        scale = 640.0
+        x_scale = width / scale
+        y_scale = height / scale
+        # Memory optimization: Process smaller batches if needed
+        max_batch_size = 32  # Reduce batch size further to prevent memory issues
+        if len(batch_images) > max_batch_size:
+            print(f"Large batch detected ({len(batch_images)} images), splitting into smaller batches of {max_batch_size}")
+            # Process in smaller chunks
+            all_bboxes = {}
+            for chunk_start in range(0, len(batch_images), max_batch_size):
+                chunk_end = min(chunk_start + max_batch_size, len(batch_images))
+                chunk_images = batch_images[chunk_start:chunk_end]
+                chunk_offset = offset + chunk_start
+                print(f"Processing chunk {chunk_start//max_batch_size + 1}: images {chunk_start}-{chunk_end-1}")
+                chunk_bboxes = self._detect_objects_batch(chunk_images, chunk_offset)
+                all_bboxes.update(chunk_bboxes)
+            return all_bboxes
+        # Preprocess images for ONNX inference
+        imgs = self._pre_process_img(batch_images, scale)
+        actual_batch_size = len(batch_images)
+        # Handle batch size mismatch - pad if needed
+        model_batch_size = self.bbox_model.get_inputs()[0].shape[0]
+        print(f"Model input shape: {self.bbox_model.get_inputs()[0].shape}, batch_size: {model_batch_size}")
+        if model_batch_size is not None:
+            try:
+                # Handle dynamic batch size (None, -1, 'None')
+                if str(model_batch_size) in ['None', '-1'] or model_batch_size == -1:
+                    model_batch_size = None
+                else:
+                    model_batch_size = int(model_batch_size)
+            except (ValueError, TypeError):
+                model_batch_size = None
+        print(f"Processed model_batch_size: {model_batch_size}, actual_batch_size: {actual_batch_size}")
+        if model_batch_size and actual_batch_size < model_batch_size:
+            padding_size = model_batch_size - actual_batch_size
+            dummy_img = np.zeros((1, 3, int(scale), int(scale)), dtype=np.float32)
+            padding = np.repeat(dummy_img, padding_size, axis=0)
+            imgs = np.vstack([imgs, padding])
+        # ONNX inference with error handling
+        try:
+            input_name = self.bbox_model.get_inputs()[0].name
+            import time
+            start_time = time.time()
+            outputs = self.bbox_model.run(None, {input_name: imgs})[0]
+            inference_time = time.time() - start_time
+            print(f"Inference time: {inference_time:.3f}s for {actual_batch_size} images")
+            # Remove padded results if we added padding
+            if model_batch_size and isinstance(model_batch_size, int) and actual_batch_size < model_batch_size:
+                outputs = outputs[:actual_batch_size]
+            # Post-process outputs to get detections
+            raw_results = self._post_process_output(np.array(outputs), x_scale, y_scale)
+        except Exception as e:
+            print(f"Error during ONNX inference: {e}")
+            return bboxes
+        if not raw_results:
+            return bboxes
+        # Convert raw results to BoundingBox objects and apply processing
+        for frame_idx_in_batch, frame_detections in enumerate(raw_results):
+            if not frame_detections:
+                continue
+            # Convert to BoundingBox objects
+            boxes: List[BoundingBox] = []
+            for box, conf, cls_id in frame_detections:
+                x1, y1, x2, y2 = box
+                if int(cls_id) < 4:
                     boxes.append(
                         BoundingBox(
                             x1=int(x1),
                             y1=int(y1),
                             x2=int(x2),
                             y2=int(y2),
+                            cls_id=int(cls_id),
                             conf=float(conf),
                         )
                     )
             # Handle footballs - keep only the best one
             footballs = [bb for bb in boxes if int(bb.cls_id) == 0]
             if len(footballs) > 1:
                 best_ball = max(footballs, key=lambda b: b.conf)
                 boxes = [bb for bb in boxes if int(bb.cls_id) != 0]
                 boxes.append(best_ball)
+            # Remove overlapping small boxes
+            boxes = suppress_small_contained_boxes(boxes, self.SMALL_CONTAINED_IOA, self.SMALL_RATIO_MAX)
+            # Handle goalkeeper detection issues:
+            # 1. Fix misplaced goalkeepers (convert to players if standing in middle)
+            # 2. Allow up to 2 goalkeepers maximum (one from each team)
+            # Goalkeepers remain class_id = 1 (no team assignment)
+            boxes = self._handle_multiple_goalkeepers(boxes)
+            # Store results (players still have class_id=2, will be classified in phase 2)
+            frame_id = offset + frame_idx_in_batch
+            bboxes[frame_id] = boxes
+        return bboxes
+    def predict_batch(
+        self,
+        batch_images: List[ndarray],
+        offset: int,
+        n_keypoints: int,
+        task_type: Optional[str] = None,
+    ) -> List[TVFrameResult]:
+        process_objects = task_type is None or task_type == "object"
+        process_keypoints = task_type is None or task_type == "keypoint"
+        # Phase 1: Object Detection for all frames
+        bboxes: Dict[int, List[BoundingBox]] = {}
+        if process_objects:
+            bboxes = self._detect_objects_batch(batch_images, offset)
+        import time
+        time_start = time.time()
+        # Phase 2: Team Classification for all detected players
+        if process_objects and bboxes:
+            bboxes, self.team_classifier_fitted, self.player_crops_for_fit = classify_teams_batch(
+                self.team_classifier,
+                self.team_classifier_fitted,
+                self.player_crops_for_fit,
+                batch_images,
+                bboxes,
+                offset,
+                self.MIN_SAMPLES_FOR_FIT,
+                self.MAX_SAMPLES_FOR_FIT,
+                self.SINGLE_PLAYER_HUE_PIVOT
             )
+        self.team_classifier_fitted = False
+        self.player_crops_for_fit = []
+        print(f"Time Team Classification: {time.time() - time_start} s")
+        # Phase 3: Keypoint Detection
+        keypoints: Dict[int, List[Tuple[int, int]]] = {}
+        if process_keypoints:
+            keypoints = self._detect_keypoints_batch(batch_images, offset, n_keypoints)
+        # Phase 4: Combine results
         results: List[TVFrameResult] = []
         for frame_number in range(offset, offset + len(batch_images)):
+            results.append(
+                TVFrameResult(
+                    frame_id=frame_number,
+                    boxes=bboxes.get(frame_number, []),
+                    keypoints=keypoints.get(
+                        frame_number,
+                        [(0, 0) for _ in range(n_keypoints)],
+                    ),
+                )
             )
+        return results
+    def _detect_keypoints_batch(self, batch_images: List[ndarray],
+                               offset: int, n_keypoints: int) -> Dict[int, List[Tuple[int, int]]]:
+        """
+        Phase 3: Keypoint detection for all frames in batch.
+        Args:
+            batch_images: List of images to process
+            offset: Frame offset for numbering
+            n_keypoints: Number of keypoints expected
+        Returns:
+            Dictionary mapping frame_id to list of keypoint coordinates
+        """
+        keypoints: Dict[int, List[Tuple[int, int]]] = {}
+        keypoints_model_results = self.keypoints_model.predict(batch_images)
+        if keypoints_model_results is None:
+            return keypoints
+        for frame_idx_in_batch, detection in enumerate(keypoints_model_results):
+            if not hasattr(detection, "keypoints") or detection.keypoints is None:
+                continue
+            # Extract keypoints with confidence
+            frame_keypoints_with_conf: List[Tuple[int, int, float]] = []
+            for i, part_points in enumerate(detection.keypoints.data):
+                for k_id, (x, y, _) in enumerate(part_points):
+                    confidence = float(detection.keypoints.conf[i][k_id])
+                    frame_keypoints_with_conf.append((int(x), int(y), confidence))
+            # Pad or truncate to expected number of keypoints
+            if len(frame_keypoints_with_conf) < n_keypoints:
+                frame_keypoints_with_conf.extend(
+                    [(0, 0, 0.0)] * (n_keypoints - len(frame_keypoints_with_conf))
+                )
+            else:
+                frame_keypoints_with_conf = frame_keypoints_with_conf[:n_keypoints]
+            # Filter keypoints based on confidence thresholds
+            filtered_keypoints: List[Tuple[int, int]] = []
+            for idx, (x, y, confidence) in enumerate(frame_keypoints_with_conf):
+                if idx in self.CORNER_INDICES:
+                    # Corner keypoints have lower confidence threshold
+                    if confidence < 0.3:
+                        filtered_keypoints.append((0, 0))
+                    else:
+                        filtered_keypoints.append((int(x), int(y)))
+                else:
+                    # Regular keypoints
+                    if confidence < 0.5:
+                        filtered_keypoints.append((0, 0))
+                    else:
+                        filtered_keypoints.append((int(x), int(y)))
+            frame_id = offset + frame_idx_in_batch
+            keypoints[frame_id] = filtered_keypoints
+        return keypoints

object-detection.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:05112479be8cb59494e9ae23a57af43becd5aa1f448b0e5ed33fcb6b4c2bbbc3
+size 273322667

pitch.py CHANGED Viewed

@@ -520,7 +520,7 @@ def run_inference(model, input_tensor: torch.Tensor, device):
     output = model.module().forward(input_tensor)
     return output
-def preprocess_batch_fast(frames):
     """Ultra-fast batch preprocessing using optimized tensor operations"""
     target_size = (540, 960)  # H, W format for model input
     batch = []
@@ -530,7 +530,7 @@ def preprocess_batch_fast(frames):
         img = img.astype(np.float32) / 255.0
         img = np.transpose(img, (2, 0, 1))  # HWC -> CHW
         batch.append(img)
-    batch = torch.from_numpy(np.stack(batch)).float()
     return batch
@@ -610,24 +610,16 @@ def inference_batch(frames, model, kp_threshold, device, batch_size=8):
     results = []
     num_frames = len(frames)
-    # Get the device from the model itself
-    model_device = next(model.parameters()).device
     # Process all frames in optimally-sized batches
     for i in range(0, num_frames, batch_size):
         current_batch_size = min(batch_size, num_frames - i)
         batch_frames = frames[i:i + current_batch_size]
-        # Fast preprocessing - create on CPU first
-        batch = preprocess_batch_fast(batch_frames)
-        b, c, h, w = batch.size()
-        # Move batch to model device
-        batch = batch.to(model_device)
-        with torch.no_grad():
-            heatmaps = model(batch)
         # Ultra-fast keypoint extraction
         kp_coords = extract_keypoints_from_heatmap_fast(heatmaps[:,:-1,:,:], scale=2, max_keypoints=1)
@@ -660,10 +652,28 @@ def get_mapped_keypoints(kp_points):
             # mapped_points[key] = value
     return mapped_points
-def process_batch_input(frames, model, kp_threshold, device, batch_size=16):
     """Process multiple input images in batch"""
     # Batch inference
     kp_results = inference_batch(frames, model, kp_threshold, device, batch_size)
     kp_results = [get_mapped_keypoints(kp) for kp in kp_results]
     return kp_results

     output = model.module().forward(input_tensor)
     return output
+def preprocess_batch_fast(frames, device):
     """Ultra-fast batch preprocessing using optimized tensor operations"""
     target_size = (540, 960)  # H, W format for model input
     batch = []
         img = img.astype(np.float32) / 255.0
         img = np.transpose(img, (2, 0, 1))  # HWC -> CHW
         batch.append(img)
+    batch = torch.tensor(np.stack(batch), dtype=torch.float32)
     return batch
     results = []
     num_frames = len(frames)
     # Process all frames in optimally-sized batches
     for i in range(0, num_frames, batch_size):
         current_batch_size = min(batch_size, num_frames - i)
         batch_frames = frames[i:i + current_batch_size]
+        # Fast preprocessing
+        batch = preprocess_batch_fast(batch_frames, device)
+        heatmaps = run_inference(model, batch, device)
         # Ultra-fast keypoint extraction
         kp_coords = extract_keypoints_from_heatmap_fast(heatmaps[:,:-1,:,:], scale=2, max_keypoints=1)
             # mapped_points[key] = value
     return mapped_points
+def process_batch_input(frames, model, kp_threshold, device, batch_size=8):
     """Process multiple input images in batch"""
     # Batch inference
     kp_results = inference_batch(frames, model, kp_threshold, device, batch_size)
     kp_results = [get_mapped_keypoints(kp) for kp in kp_results]
+    # Draw results and save
+    # for i, (frame, kp_points, input_path) in enumerate(zip(frames, kp_results, valid_paths)):
+    #     height, width = frame.shape[:2]
+    #     # Apply mapping to get standard keypoint IDs
+    #     mapped_kp_points = get_mapped_keypoints(kp_points)
+    #     for key, value in mapped_kp_points.items():
+    #         x = int(value['x'] * width)
+    #         y = int(value['y'] * height)
+    #         cv2.circle(frame, (x, y), 5, (0, 255, 0), -1)  # Green circles
+    #         cv2.putText(frame, str(key), (x+10, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)
+    #     # Save result
+    #     output_path = input_path.replace('.png', '_result.png').replace('.jpg', '_result.jpg')
+    #     cv2.imwrite(output_path, frame)
+    # print(f"Batch processing complete. Processed {len(frames)} images.")
     return kp_results

player.py ADDED Viewed

	@@ -0,0 +1,388 @@

+import cv2
+import numpy as np
+from sklearn.cluster import KMeans
+import warnings
+import time
+import torch
+from torchvision.ops import batched_nms
+from numpy import ndarray
+# Suppress ALL runtime and sklearn warnings
+warnings.filterwarnings('ignore', category=RuntimeWarning)
+warnings.filterwarnings('ignore', category=FutureWarning)
+warnings.filterwarnings('ignore', category=UserWarning)
+# Suppress sklearn warnings specifically
+import logging
+logging.getLogger('sklearn').setLevel(logging.ERROR)
+def get_grass_color(img):
+    # Convert image to HSV color space
+    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
+    # Define range of green color in HSV
+    lower_green = np.array([30, 40, 40])
+    upper_green = np.array([80, 255, 255])
+    # Threshold the HSV image to get only green colors
+    mask = cv2.inRange(hsv, lower_green, upper_green)
+    # Calculate the mean value of the pixels that are not masked
+    masked_img = cv2.bitwise_and(img, img, mask=mask)
+    grass_color = cv2.mean(img, mask=mask)
+    return grass_color[:3]
+def get_players_boxes(frame, result):
+    players_imgs = []
+    players_boxes = []
+    for (box, score, cls) in result:
+        label = int(cls)
+        if label == 0:
+            x1, y1, x2, y2 = box.astype(int)
+            player_img = frame[y1: y2, x1: x2]
+            players_imgs.append(player_img)
+            players_boxes.append([box, score, cls])
+    return players_imgs, players_boxes
+def get_kits_colors(players, grass_hsv=None, frame=None):
+    kits_colors = []
+    if grass_hsv is None:
+        grass_color = get_grass_color(frame)
+        grass_hsv = cv2.cvtColor(np.uint8([[list(grass_color)]]), cv2.COLOR_BGR2HSV)
+    for player_img in players:
+        # Skip empty or invalid images
+        if player_img is None or player_img.size == 0 or len(player_img.shape) != 3:
+            continue
+        # Convert image to HSV color space
+        hsv = cv2.cvtColor(player_img, cv2.COLOR_BGR2HSV)
+        # Define range of green color in HSV
+        lower_green = np.array([grass_hsv[0, 0, 0] - 10, 40, 40])
+        upper_green = np.array([grass_hsv[0, 0, 0] + 10, 255, 255])
+        # Threshold the HSV image to get only green colors
+        mask = cv2.inRange(hsv, lower_green, upper_green)
+        # Bitwise-AND mask and original image
+        mask = cv2.bitwise_not(mask)
+        upper_mask = np.zeros(player_img.shape[:2], np.uint8)
+        upper_mask[0:player_img.shape[0]//2, 0:player_img.shape[1]] = 255
+        mask = cv2.bitwise_and(mask, upper_mask)
+        kit_color = np.array(cv2.mean(player_img, mask=mask)[:3])
+        kits_colors.append(kit_color)
+    return kits_colors
+def get_kits_classifier(kits_colors):
+    if len(kits_colors) == 0:
+        return None
+    if len(kits_colors) == 1:
+        # Only one kit color, create a dummy classifier
+        return None
+    kits_kmeans = KMeans(n_clusters=2)
+    kits_kmeans.fit(kits_colors)
+    return kits_kmeans
+def classify_kits(kits_classifer, kits_colors):
+    if kits_classifer is None or len(kits_colors) == 0:
+        return np.array([0])  # Default to team 0
+    team = kits_classifer.predict(kits_colors)
+    return team
+def get_left_team_label(players_boxes, kits_colors, kits_clf):
+    left_team_label = 0
+    team_0 = []
+    team_1 = []
+    for i in range(len(players_boxes)):
+        x1, y1, x2, y2 = players_boxes[i][0].astype(int)
+        team = classify_kits(kits_clf, [kits_colors[i]]).item()
+        if team == 0:
+            team_0.append(np.array([x1]))
+        else:
+            team_1.append(np.array([x1]))
+    team_0 = np.array(team_0)
+    team_1 = np.array(team_1)
+    # Safely calculate averages with fallback for empty arrays
+    avg_team_0 = np.average(team_0) if len(team_0) > 0 else 0
+    avg_team_1 = np.average(team_1) if len(team_1) > 0 else 0
+    if avg_team_0 - avg_team_1 > 0:
+        left_team_label = 1
+    return left_team_label
+def check_box_boundaries(boxes, img_height, img_width):
+    """
+    Check if bounding boxes are within image boundaries and clip them if necessary.
+    Args:
+        boxes: numpy array of shape (N, 4) with [x1, y1, x2, y2] format
+        img_height: height of the image
+        img_width: width of the image
+    Returns:
+        valid_boxes: numpy array of valid boxes within boundaries
+        valid_indices: indices of valid boxes
+    """
+    x1, y1, x2, y2 = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
+    # Check if boxes are within boundaries
+    valid_mask = (x1 >= 0) & (y1 >= 0) & (x2 < img_width) & (y2 < img_height) & (x1 < x2) & (y1 < y2)
+    if not np.any(valid_mask):
+        return np.array([]), np.array([])
+    valid_boxes = boxes[valid_mask]
+    valid_indices = np.where(valid_mask)[0]
+    # Clip boxes to image boundaries
+    valid_boxes[:, 0] = np.clip(valid_boxes[:, 0], 0, img_width - 1)   # x1
+    valid_boxes[:, 1] = np.clip(valid_boxes[:, 1], 0, img_height - 1)  # y1
+    valid_boxes[:, 2] = np.clip(valid_boxes[:, 2], 0, img_width - 1)   # x2
+    valid_boxes[:, 3] = np.clip(valid_boxes[:, 3], 0, img_height - 1)  # y2
+    return valid_boxes, valid_indices
+def process_team_identification_batch(frames, results, kits_clf, left_team_label, grass_hsv):
+    """
+    Process team identification and label formatting for batch results.
+    Args:
+        frames: list of frames
+        results: list of detection results for each frame
+        kits_clf: trained kit classifier
+        left_team_label: label for left team
+        grass_hsv: grass color in HSV format
+    Returns:
+        processed_results: list of processed results with team identification
+    """
+    processed_results = []
+    for frame_idx, frame in enumerate(frames):
+        frame_results = []
+        frame_detections = results[frame_idx]
+        if not frame_detections:
+            processed_results.append([])
+            continue
+        # Extract player boxes and images
+        players_imgs = []
+        players_boxes = []
+        player_indices = []
+        for idx, (box, score, cls) in enumerate(frame_detections):
+            label = int(cls)
+            if label == 0:  # Player detection
+                x1, y1, x2, y2 = box.astype(int)
+                # Check boundaries
+                if (x1 >= 0 and y1 >= 0 and x2 < frame.shape[1] and y2 < frame.shape[0] and x1 < x2 and y1 < y2):
+                    player_img = frame[y1:y2, x1:x2]
+                    if player_img.size > 0:  # Ensure valid image
+                        players_imgs.append(player_img)
+                        players_boxes.append([box, score, cls])
+                        player_indices.append(idx)
+        # Initialize player team mapping
+        player_team_map = {}
+        # Process team identification if we have players
+        if players_imgs and kits_clf is not None:
+            kits_colors = get_kits_colors(players_imgs, grass_hsv)
+            teams = classify_kits(kits_clf, kits_colors)
+            # Create mapping from player index to team
+            for i, team in enumerate(teams):
+                player_team_map[player_indices[i]] = team.item()
+        id = 0
+        # Process all detections with team identification
+        for idx, (box, score, cls) in enumerate(frame_detections):
+            label = int(cls)
+            x1, y1, x2, y2 = box.astype(int)
+            # Check boundaries
+            valid_boxes, valid_indices = check_box_boundaries(
+                np.array([[x1, y1, x2, y2]]), frame.shape[0], frame.shape[1]
+            )
+            if len(valid_boxes) == 0:
+                continue
+            x1, y1, x2, y2 = valid_boxes[0].astype(int)
+            # Apply team identification logic
+            if label == 0:  # Player
+                if players_imgs and kits_clf is not None and idx in player_team_map:
+                    team = player_team_map[idx]
+                    if team == left_team_label:
+                        final_label = 6  # Player-L (Left team)
+                    else:
+                        final_label = 7  # Player-R (Right team)
+                else:
+                    final_label = 6  # Default player label
+            elif label == 1:  # Goalkeeper
+                final_label = 1  # GK
+            elif label == 2:  # Ball
+                final_label = 0  # Ball
+            elif label == 3 or label == 4:  # Referee or other
+                final_label = 3  # Referee
+            else:
+                final_label = int(label)  # Keep original label, ensure it's int
+            frame_results.append({
+                "id": int(id),
+                "bbox": [int(x1), int(y1), int(x2), int(y2)],
+                "class_id": int(final_label),
+                "conf": float(score)
+            })
+            id = id + 1
+        processed_results.append(frame_results)
+    return processed_results
+def convert_numpy_types(obj):
+    """Convert numpy types to native Python types for JSON serialization."""
+    if isinstance(obj, np.integer):
+        return int(obj)
+    elif isinstance(obj, np.floating):
+        return float(obj)
+    elif isinstance(obj, np.ndarray):
+        return obj.tolist()
+    elif isinstance(obj, dict):
+        return {key: convert_numpy_types(value) for key, value in obj.items()}
+    elif isinstance(obj, list):
+        return [convert_numpy_types(item) for item in obj]
+    else:
+        return obj
+def pre_process_img(frames, scale):
+    imgs = np.stack([cv2.resize(frame, (int(scale), int(scale))) for frame in frames])
+    imgs = imgs.transpose(0, 3, 1, 2)
+    imgs = imgs.astype(np.float32) / 255.0  # Normalize
+    return imgs
+def post_process_output(outputs, x_scale, y_scale, conf_thresh=0.6, nms_thresh=0.75):
+    B, C, N = outputs.shape
+    outputs = torch.from_numpy(outputs)
+    outputs = outputs.permute(0, 2, 1)
+    boxes = outputs[..., :4]
+    class_scores = 1 / (1 + torch.exp(-outputs[..., 4:]))
+    conf, class_id = class_scores.max(dim=2)
+    mask = conf > conf_thresh
+    for i in range(class_id.shape[0]):  # loop over batch
+    # Find detections that are balls
+        ball_idx = np.where(class_id[i] == 2)[0]
+        if ball_idx.size > 0:
+            # Pick the one with the highest confidence
+            top = ball_idx[np.argmax(conf[i, ball_idx])]
+            if conf[i, top] > 0.55:  # apply confidence threshold
+                mask[i, top] = True
+    # ball_mask = (class_id == 2) & (conf > 0.51)
+    # mask = mask | ball_mask
+    batch_idx, pred_idx = mask.nonzero(as_tuple=True)
+    if len(batch_idx) == 0:
+        return [[] for _ in range(B)]
+    boxes = boxes[batch_idx, pred_idx]
+    conf = conf[batch_idx, pred_idx]
+    class_id = class_id[batch_idx, pred_idx]
+    x, y, w, h = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
+    x1 = (x - w / 2) * x_scale
+    y1 = (y - h / 2) * y_scale
+    x2 = (x + w / 2) * x_scale
+    y2 = (y + h / 2) * y_scale
+    boxes_xyxy = torch.stack([x1, y1, x2, y2], dim=1)
+    max_coord = 1e4
+    offset = batch_idx.to(boxes_xyxy) * max_coord
+    boxes_for_nms = boxes_xyxy + offset[:, None]
+    keep = batched_nms(boxes_for_nms, conf, batch_idx, nms_thresh)
+    boxes_final = boxes_xyxy[keep]
+    conf_final = conf[keep]
+    class_final = class_id[keep]
+    batch_final = batch_idx[keep]
+    results = [[] for _ in range(B)]
+    for b in range(B):
+        mask_b = batch_final == b
+        if mask_b.sum() == 0:
+            continue
+        results[b] = list(zip(boxes_final[mask_b].numpy(),
+                              conf_final[mask_b].numpy(),
+                              class_final[mask_b].numpy()))
+    return results
+def player_detection_result(frames: list[ndarray], batch_size, model, kits_clf=None, left_team_label=None, grass_hsv=None):
+    start_time = time.time()
+    # input_layer = model.input(0)
+    # output_layer = model.output(0)
+    height, width = frames[0].shape[:2]
+    scale = 640.0
+    x_scale = width / scale
+    y_scale = height / scale
+    # infer_queue = AsyncInferQueue(model, len(frames))
+    infer_time = time.time()
+    kits_clf = kits_clf
+    left_team_label = left_team_label
+    grass_hsv = grass_hsv
+    results = []
+    for i in range(0, len(frames), batch_size):
+        if i + batch_size > len(frames):
+            batch_size = len(frames) - i
+        batch_frames = frames[i:i + batch_size]
+        imgs = pre_process_img(batch_frames, scale)
+        input_name = model.get_inputs()[0].name
+        outputs = model.run(None, {input_name: imgs})[0]
+        raw_results = post_process_output(np.array(outputs), x_scale, y_scale)
+        if kits_clf is None or left_team_label is None or grass_hsv is None:
+            # Use first frame to initialize team classification
+            first_frame = batch_frames[0]
+            first_frame_results = raw_results[0] if raw_results else []
+            if first_frame_results:
+                players_imgs, players_boxes = get_players_boxes(first_frame, first_frame_results)
+                if players_imgs:
+                    grass_color = get_grass_color(first_frame)
+                    grass_hsv = cv2.cvtColor(np.uint8([[list(grass_color)]]), cv2.COLOR_BGR2HSV)
+                    kits_colors = get_kits_colors(players_imgs, grass_hsv)
+                    if kits_colors:  # Only proceed if we have valid kit colors
+                        kits_clf = get_kits_classifier(kits_colors)
+                        if kits_clf is not None:
+                            left_team_label = int(get_left_team_label(players_boxes, kits_colors, kits_clf))
+        # Process team identification and boundary checking
+        processed_results = process_team_identification_batch(
+            batch_frames, raw_results, kits_clf, left_team_label, grass_hsv
+        )
+        processed_results = convert_numpy_types(processed_results)
+        results.extend(processed_results)
+    # Return the same format as before for compatibility
+    return results, kits_clf, left_team_label, grass_hsv