Vranlee commited on Aug 2, 2025

Commit

0985519

verified ·

1 Parent(s): 74f4a08

Upload 552 files

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +1 -0
.gitignore +1 -0
LICENSE +21 -0
README.md +200 -3
assets/Fig.PNG +3 -0
conda_env.yaml +122 -0
deploy/ONNXRuntime/README.md +19 -0
deploy/ONNXRuntime/onnx_inference.py +161 -0
deploy/TensorRT/cpp/CMakeLists.txt +39 -0
deploy/TensorRT/cpp/README.md +58 -0
deploy/TensorRT/cpp/include/BYTETracker.h +49 -0
deploy/TensorRT/cpp/include/STrack.h +50 -0
deploy/TensorRT/cpp/include/dataType.h +36 -0
deploy/TensorRT/cpp/include/kalmanFilter.h +31 -0
deploy/TensorRT/cpp/include/lapjv.h +63 -0
deploy/TensorRT/cpp/include/logging.h +503 -0
deploy/TensorRT/cpp/src/BYTETracker.cpp +241 -0
deploy/TensorRT/cpp/src/STrack.cpp +192 -0
deploy/TensorRT/cpp/src/bytetrack.cpp +505 -0
deploy/TensorRT/cpp/src/kalmanFilter.cpp +152 -0
deploy/TensorRT/cpp/src/lapjv.cpp +343 -0
deploy/TensorRT/cpp/src/utils.cpp +429 -0
deploy/TensorRT/python/README.md +22 -0
deploy/ncnn/cpp/CMakeLists.txt +84 -0
deploy/ncnn/cpp/README.md +103 -0
deploy/ncnn/cpp/include/BYTETracker.h +49 -0
deploy/ncnn/cpp/include/STrack.h +50 -0
deploy/ncnn/cpp/include/dataType.h +36 -0
deploy/ncnn/cpp/include/kalmanFilter.h +31 -0
deploy/ncnn/cpp/include/lapjv.h +63 -0
deploy/ncnn/cpp/src/BYTETracker.cpp +241 -0
deploy/ncnn/cpp/src/STrack.cpp +192 -0
deploy/ncnn/cpp/src/bytetrack.cpp +396 -0
deploy/ncnn/cpp/src/kalmanFilter.cpp +152 -0
deploy/ncnn/cpp/src/lapjv.cpp +343 -0
deploy/ncnn/cpp/src/utils.cpp +429 -0
deploy/scripts/export_onnx.py +102 -0
deploy/scripts/trt.py +74 -0
docs/DEPLOY.md +38 -0
exps/SU-T-ReID.py +162 -0
exps/SU-T.py +152 -0
exps/default/nano.py +39 -0
exps/default/yolov3.py +89 -0
exps/default/yolox_l.py +15 -0
exps/default/yolox_m.py +15 -0
exps/default/yolox_s.py +15 -0
exps/default/yolox_tiny.py +19 -0
exps/default/yolox_x.py +15 -0
fast_reid/CHANGELOG.md +39 -0
fast_reid/GETTING_STARTED.md +62 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/Fig.PNG filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ .DS_Store

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2025 LI Weiran
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,3 +1,200 @@
----
-license: apache-2.0
----

+# When Trackers Date Fish: A Benchmark and Framework for Underwater Multiple Fish Tracking
+The official implementation of the paper：
+>  [**When Trackers Date Fish: A Benchmark and Framework for Underwater Multiple Fish Tracking**](https://vranlee.github.io/SU-T/)
+>  Weiran Li, Yeqiang Liu, Qiannan Guo, Yijie Wei, Hwa Liang Leo, Zhenbo Li*
+>  [**\[Project\]**](https://vranlee.github.io/SU-T/) [**\[Paper\]**](https://arxiv.org/abs/2507.06400) [**\[Code\]**](https://github.com/vranlee/SU-T)
+<div align="center">
+<img src="assets/Fig.PNG" width="900"/>
+</div>
+> Contact: vranlee@cau.edu.cn or weiranli@u.nus.edu. Any questions or discussion are welcome!
+>
+> If like this work, a star 🌟 would be much appreciated!
+-----
+## 📌Updates
++ [2025.07] Paper released to arXiv.
++ [2025.07] Fixed bugs.
++ [2025.04] We have released the MFT25 dataset and codes of SU-T!
+-----
+## 💡Abstract
+Multiple object tracking (MOT) technology has made significant progress in terrestrial applications, but underwater tracking scenarios remain underexplored despite their importance to marine ecology and aquaculture. We present Multiple Fish Tracking Dataset 2025 (MFT25), the first comprehensive dataset specifically designed for underwater multiple fish tracking, featuring 15 diverse video sequences with 408,578 meticulously annotated bounding boxes across 48,066 frames. Our dataset captures various underwater environments, fish species, and challenging conditions including occlusions, similar appearances, and erratic motion patterns. Additionally, we introduce Scale-aware and Unscented Tracker (SU-T), a specialized tracking framework featuring an Unscented Kalman Filter (UKF) optimized for non-linear fish swimming patterns and a novel Fish-Intersection-over-Union (FishIoU) matching that accounts for the unique morphological characteristics of aquatic species. Extensive experiments demonstrate that our SU-T baseline achieves state-of-the-art performance on MFT25, with 34.1 HOTA and 44.6 IDF1, while revealing fundamental differences between fish tracking and terrestrial object tracking scenarios. MFT25 establishes a robust foundation for advancing research in underwater tracking systems with important applications in marine biology, aquaculture monitoring, and ecological conservation.
+## 🏆Contributions
++ We introduce MFT25, the first comprehensive multiple fish tracking dataset featuring 15 diverse video sequences with 408,578 meticulously annotated bounding boxes across 48,066 frames, capturing various underwater environments, fish species, and challenging conditions including occlusions, rapid direction changes, and visually similar appearances.
++ We propose SU-T, a specialized tracking framework featuring an Unscented Kalman Filter (UKF) optimized for non-linear fish swimming patterns and a novel Fish-Intersection-over-Union (FishIoU) matching that accounts for the unique morphological characteristics and erratic movement behaviors of aquatic species.
++ We conduct extensive comparative experiments demonstrating that our tracker achieves state-of-the-art performance on MFT25, with 34.1 HOTA and 44.6 IDF1. Through quantitative analysis, we highlight the fundamental differences between fish tracking and land-based object tracking scenarios.
+## 🛠️Installation Guide
+### Prerequisites
+- CUDA >= 10.2
+- Python >= 3.7
+- PyTorch >= 1.7.0
+- Ubuntu 18.04 or later (Windows is also supported but may require additional setup)
+### Step-by-Step Installation
+1. **Clone the Repository**
+   ```bash
+   git clone https://github.com/vranlee/SU-T.git
+   cd SU-T
+   ```
+2. **Create and Activate Conda Environment**
+   ```bash
+   # Create environment from yaml file
+   conda env create -f conda_env.yaml
+   # Activate the environment
+   conda activate su_t
+   ```
+3. **Download Required Resources**
+   - Download pretrained models from [BaiduYun (Password: 9uqc)](https://pan.baidu.com/s/1AkIuViwXCPz5l5Oo-UgtaQ?pwd=9uqc)
+   - Download MFT25 dataset from [BaiduYun (Password: wrbg)](https://pan.baidu.com/s/11TkRqNIq4poNAU5dyoL5hA?pwd=wrbg)
+4. **Organize the Directory Structure**
+   ```
+   SU-T/
+   ├── pretrained/
+   │   └── Checkpoint.pth.tar
+   ├── MFT25/
+   │   ├── train/
+   │   └── test/
+   └── ...
+   ```
+## 🍭Usage Guide
+### Training
+1. **Basic Training Command**
+   ```bash
+   python tools/train.py \
+       -f exps/SU-T.py \        # Base model configuration
+       -d 8 \                   # Number of GPUs
+       -b 48 \                  # Batch size
+       --fp16 \                 # Enable mixed precision training
+       -o \                     # Enable occupy GPU memory
+       -c pretrained/Checkpoint.pth.tar  # Path to pretrained weights
+   ```
+2. **Training with ReID Module**
+   ```bash
+   python tools/train.py \
+       -f exps/SU-T-ReID.py \   # ReID model configuration
+       -d 8 \
+       -b 48 \
+       --fp16 \
+       -o \
+       -c pretrained/Checkpoint.pth.tar
+   ```
+### Testing
+1. **Basic Testing Command**
+   ```bash
+   python tools/su_tracker.py \
+       -f exps/SU-T.py \        # Model configuration
+       -b 1 \                   # Batch size
+       -d 1 \                   # Number of GPUs
+       --fp16 \                 # Enable mixed precision
+       --fuse \                 # Enable model fusion
+       --expn your_exp_name     # Experiment name
+   ```
+2. **Testing with ReID Module**
+   ```bash
+   python tools/su_tracker.py \
+       -f exps/SU-T-ReID.py \   # ReID model configuration
+       -b 1 \
+       -d 1 \
+       --fp16 \
+       --fuse \
+       --expn your_exp_name
+   ```
+### Additional Configuration Options
+- **Model Configuration**: Edit `exps/SU-T.py` or `exps/SU-T-ReID.py` to modify:
+  - Learning rate
+  - Training epochs
+  - Data augmentation parameters
+  - Model architecture settings
+- **Training Parameters**:
+  ```bash
+  # Additional training options
+  --cache        # Cache images in RAM
+  --resume       # Resume from a specific checkpoint
+  --trt          # Export TensorRT model
+  ```
+- **Testing Parameters**:
+  ```bash
+  # Additional testing options
+  --tsize        # Test image size
+  --conf         # Confidence threshold
+  --nms          # NMS threshold
+  --track_thresh # Tracking threshold
+  ```
+## 📜Tracking Performance
+### Comparisons on MFT25 dataset
+| Method | Class | Year | HOTA↑ | IDF1↑ | MOTA↑ | AssA↑ | DetA↑ | IDs↓ | IDFP↓ | IDFN↓ | Frag↓ |
+|--------|-------|------|-------|-------|-------|-------|-------|------|-------|-------|-------|
+| FairMOT | JDE | 2021 | 22.226 | 26.867 | 47.509 | 13.910 | 35.606 | 939 | 58198 | 113393 | 3768 |
+| CMFTNet | JDE | 2022 | 22.432 | 27.659 | 46.365 | 14.278 | 35.452 | 1301 | 64754 | 111263 | 2769 |
+| TransTrack | TF | 2021 | 30.426 | 35.215 | 68.983 | 18.525 | _50.458_ | 1116 | 96045 | 93418 | 2588 |
+| TransCenter | TF | 2023 | 27.896 | 30.278 | 68.693 | **30.255** | 30.301 | 807 | 101223 | 101002 | 1992 |
+| TrackFormer | TF | 2022 | 30.361 | 35.285 | **74.609** | 17.661 | **52.649** | 718 | 89391 | 94720 | 1729 |
+| TFMFT | TF | 2024 | 25.440 | 33.950 | 49.725 | 17.112 | 38.059 | 719 | 63125 | 102378 | 3251 |
+| SORT | SDE | 2016 | 29.063 | 34.119 | 69.038 | 16.952 | 50.195 | 778 | 88928 | 96815 | _1726_ |
+| ByteTrack | SDE | 2022 | 31.758 | 40.355 | _69.586_ | 20.392 | 49.712 | **489** | 80765 | 87866 | **1555** |
+| BoT-SORT | SDE | 2022 | 26.848 | 36.847 | 49.108 | 19.446 | 37.241 | _500_ | 57581 | 99181 | 2704 |
+| OC-SORT | SDE | 2023 | 25.017 | 34.620 | 46.706 | 17.783 | 35.369 | 550 | **52934** | 103495 | 3651 |
+| Deep-OC-SORT | SDE | 2023 | 24.848 | 34.176 | 46.721 | 17.537 | 35.373 | 550 | _53478_ | 104024 | 3659 |
+| HybridSORT | SDE | 2024 | 32.258 | 38.421 | 68.905 | 20.936 | 49.992 | 613 | 85924 | 90022 | 1931 |
+| HybridSORT† | SDE | 2024 | 32.705 | _41.727_ | 69.167 | 21.701 | 49.697 | 562 | 79189 | 85830 | 1963 |
+| **SU-T (Ours)** | SDE | 2025 | _33.351_ | 41.717 | 68.450 | 22.425 | 49.943 | 607 | 83111 | _84814_ | 2006 |
+| **SU-T† (Ours)** | SDE | 2025 | **34.067** | **44.643** | 68.958 | _23.594_ | 49.531 | 544 | 76440 | **81304** | 2011 |
+*Note:  † indicates the integration of ReID module, **Bold** indicates the best performance, _italics_ indicate the second-best performance
+## ⁉️Troubleshooting
+### Common Issues
+1. **CUDA Out of Memory**
+   - Reduce batch size
+   - Use smaller input resolution
+   - Enable mixed precision training
+2. **Installation Failures**
+   - Ensure CUDA toolkit matches PyTorch version
+   - Try creating environment with `pip` if conda fails
+   - Check system CUDA compatibility
+3. **Training Issues**
+   - Verify dataset path and structure
+   - Check GPU memory usage
+   - Monitor learning rate and loss curves
+## 💕Acknowledgement
+A large part of the code is borrowed from [ByteTrack](https://github.com/ifzhang/ByteTrack), [OC_SORT](https://github.com/noahcao/OC_SORT), and [HybridSORT](https://github.com/ymzis69/HybridSORT). Thanks for their wonderful works!
+## 📖Citation
+The citation format will be given after the manuscript is accepted. Using arXiv's citation if needed now.
+## 📑License
+This project is released under the [MIT License](LICENSE).

assets/Fig.PNG ADDED Viewed

Git LFS Details

SHA256: 85d5a071290117b9f1e6e918981726b6b894ff9f1675d121867db94a8172cfc5
Pointer size: 132 Bytes
Size of remote file: 2.15 MB

conda_env.yaml ADDED Viewed

	@@ -0,0 +1,122 @@

+name: su_t
+channels:
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - bzip2=1.0.8=h5eee18b_6
+  - ca-certificates=2025.2.25=h06a4308_0
+  - ld_impl_linux-64=2.40=h12ee557_0
+  - libffi=3.3=he6710b0_2
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - libuuid=1.41.5=h5eee18b_0
+  - ncurses=6.4=h6a678d5_0
+  - openssl=1.1.1w=h7f8727e_0
+  - pip=25.0=py310h06a4308_0
+  - python=3.10.0=h12debd9_5
+  - readline=8.2=h5eee18b_0
+  - setuptools=75.8.0=py310h06a4308_0
+  - sqlite=3.45.3=h5eee18b_0
+  - tk=8.6.14=h39e8969_0
+  - wheel=0.45.1=py310h06a4308_0
+  - xz=5.6.4=h5eee18b_1
+  - zlib=1.2.13=h5eee18b_1
+  - pip:
+    - absl-py==2.1.0
+    - beautifulsoup4==4.13.3
+    - certifi==2025.1.31
+    - charset-normalizer==3.4.1
+    - coloredlogs==15.0.1
+    - contourpy==1.3.1
+    - cycler==0.12.1
+    - cython==3.0.12
+    - cython-bbox==0.1.5
+    - easydict==1.13
+    - einops==0.8.1
+    - faiss-gpu==1.7.2
+    - filelock==3.13.1
+    - filterpy==1.4.5
+    - flatbuffers==25.2.10
+    - fonttools==4.56.0
+    - fsspec==2024.6.1
+    - gdown==5.2.0
+    - grpcio==1.70.0
+    - h5py==3.13.0
+    - humanfriendly==10.0
+    - idna==3.10
+    - imageio==2.37.0
+    - jinja2==3.1.4
+    - joblib==1.4.2
+    - kiwisolver==1.4.8
+    - lap==0.5.12
+    - lazy-loader==0.4
+    - loguru==0.7.3
+    - markdown==3.7
+    - markdown-it-py==3.0.0
+    - markupsafe==2.1.5
+    - matplotlib==3.10.1
+    - mdurl==0.1.2
+    - mpmath==1.3.0
+    - networkx==3.3
+    - ninja==1.11.1.3
+    - numpy==1.23.4
+    - nvidia-cublas-cu11==11.11.3.6
+    - nvidia-cuda-cupti-cu11==11.8.87
+    - nvidia-cuda-nvrtc-cu11==11.8.89
+    - nvidia-cuda-runtime-cu11==11.8.89
+    - nvidia-cudnn-cu11==9.1.0.70
+    - nvidia-cufft-cu11==10.9.0.58
+    - nvidia-curand-cu11==10.3.0.86
+    - nvidia-cusolver-cu11==11.4.1.48
+    - nvidia-cusparse-cu11==11.7.5.86
+    - nvidia-ml-py==12.570.86
+    - nvidia-nccl-cu11==2.21.5
+    - nvidia-nvtx-cu11==11.8.86
+    - nvitop==1.4.2
+    - onnx==1.17.0
+    - onnx-simplifier==0.4.36
+    - onnxruntime==1.12.0
+    - opencv-python==4.11.0.86
+    - packaging==24.2
+    - pandas==2.2.3
+    - pillow==11.0.0
+    - prettytable==3.15.1
+    - protobuf==6.30.0
+    - psutil==7.0.0
+    - pygments==2.19.1
+    - pyparsing==3.2.1
+    - pysocks==1.7.1
+    - python-dateutil==2.9.0.post0
+    - pytz==2025.1
+    - pyyaml==6.0.2
+    - requests==2.32.3
+    - rich==13.9.4
+    - scikit-image==0.24.0
+    - scikit-learn==1.6.1
+    - scipy==1.13.1
+    - six==1.17.0
+    - soupsieve==2.6
+    - sympy==1.13.1
+    - tabulate==0.9.0
+    - tensorboard==2.19.0
+    - tensorboard-data-server==0.7.2
+    - termcolor==2.5.0
+    - thop==0.1.1-2209072238
+    - threadpoolctl==3.5.0
+    - tifffile==2025.2.18
+    - torch==2.6.0+cu118
+    - torchaudio==2.6.0+cu118
+    - torchvision==0.21.0+cu118
+    - tqdm==4.67.1
+    - triton==3.2.0
+    - typing-extensions==4.12.2
+    - tzdata==2025.1
+    - urllib3==2.3.0
+    - vit-pytorch==1.9.2
+    - wcwidth==0.2.13
+    - werkzeug==3.1.3
+    - xmltodict==0.14.2
+    - yacs==0.1.8
+prefix: /home/weiranli/anaconda3/envs/su_t

deploy/ONNXRuntime/README.md ADDED Viewed

	@@ -0,0 +1,19 @@

+## ByteTrack-ONNXRuntime in Python
+This doc introduces how to convert your pytorch model into onnx, and how to run an onnxruntime demo to verify your convertion.
+### Convert Your Model to ONNX
+```shell
+cd <ByteTrack_HOME>
+python3 tools/export_onnx.py --output-name bytetrack_s.onnx -f exps/example/mot/yolox_s_mix_det.py -c pretrained/bytetrack_s_mot17.pth.tar
+```
+### ONNXRuntime Demo
+You can run onnx demo with **16 FPS** (96-core Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz):
+```shell
+cd <ByteTrack_HOME>/deploy/ONNXRuntime
+python3 onnx_inference.py
+```

deploy/ONNXRuntime/onnx_inference.py ADDED Viewed

	@@ -0,0 +1,161 @@

+import argparse
+import os
+import cv2
+import numpy as np
+from loguru import logger
+import onnxruntime
+from yolox.data.data_augment import preproc as preprocess
+from yolox.utils import mkdir, multiclass_nms, demo_postprocess, vis
+from yolox.utils.visualize import plot_tracking
+from trackers.ocsort_tracker.ocsort import OCSort
+from trackers.tracking_utils.timer import Timer
+def make_parser():
+    parser = argparse.ArgumentParser("onnxruntime inference sample")
+    parser.add_argument(
+        "-m",
+        "--model",
+        type=str,
+        default="../../ocsort.onnx",
+        help="Input your onnx model.",
+    )
+    parser.add_argument(
+        "-i",
+        "--video_path",
+        type=str,
+        default='../../videos/dance_demo.mp4',
+        help="Path to your input image.",
+    )
+    parser.add_argument(
+        "-o",
+        "--output_dir",
+        type=str,
+        default='demo_output',
+        help="Path to your output directory.",
+    )
+    parser.add_argument(
+        "-s",
+        "--score_thr",
+        type=float,
+        default=0.1,
+        help="Score threshould to filter the result.",
+    )
+    parser.add_argument(
+        "-n",
+        "--nms_thr",
+        type=float,
+        default=0.7,
+        help="NMS threshould.",
+    )
+    parser.add_argument(
+        "--input_shape",
+        type=str,
+        default="800,1440",
+        help="Specify an input shape for inference.",
+    )
+    parser.add_argument(
+        "--with_p6",
+        action="store_true",
+        help="Whether your model uses p6 in FPN/PAN.",
+    )
+    # tracking args
+    parser.add_argument("--track_thresh", type=float, default=0.6, help="tracking confidence threshold")
+    parser.add_argument("--iou_thresh", type=float, default=0.3, help="tracking confidence threshold")
+    parser.add_argument("--track_buffer", type=int, default=30, help="the frames for keep lost tracks")
+    parser.add_argument("--match_thresh", type=float, default=0.8, help="matching threshold for tracking")
+    parser.add_argument('--min-box-area', type=float, default=10, help='filter out tiny boxes')
+    parser.add_argument("--mot20", dest="mot20", default=False, action="store_true", help="test mot20.")
+    return parser
+class Predictor(object):
+    def __init__(self, args):
+        self.rgb_means = (0.485, 0.456, 0.406)
+        self.std = (0.229, 0.224, 0.225)
+        self.args = args
+        self.session = onnxruntime.InferenceSession(args.model)
+        self.input_shape = tuple(map(int, args.input_shape.split(',')))
+    def inference(self, ori_img, timer):
+        img_info = {"id": 0}
+        height, width = ori_img.shape[:2]
+        img_info["height"] = height
+        img_info["width"] = width
+        img_info["raw_img"] = ori_img
+        img, ratio = preprocess(ori_img, self.input_shape, self.rgb_means, self.std)
+        img_info["ratio"] = ratio
+        ort_inputs = {self.session.get_inputs()[0].name: img[None, :, :, :]}
+        timer.tic()
+        output = self.session.run(None, ort_inputs)
+        predictions = demo_postprocess(output[0], self.input_shape, p6=self.args.with_p6)[0]
+        boxes = predictions[:, :4]
+        scores = predictions[:, 4:5] * predictions[:, 5:]
+        boxes_xyxy = np.ones_like(boxes)
+        boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2]/2.
+        boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3]/2.
+        boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2]/2.
+        boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3]/2.
+        boxes_xyxy /= ratio
+        dets = multiclass_nms(boxes_xyxy, scores, nms_thr=self.args.nms_thr, score_thr=self.args.score_thr)
+        return dets[:, :-1], img_info
+def imageflow_demo(predictor, args):
+    cap = cv2.VideoCapture(args.video_path)
+    width = cap.get(cv2.CAP_PROP_FRAME_WIDTH)  # float
+    height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)  # float
+    fps = cap.get(cv2.CAP_PROP_FPS)
+    save_folder = args.output_dir
+    os.makedirs(save_folder, exist_ok=True)
+    save_path = os.path.join(save_folder, args.video_path.split("/")[-1])
+    logger.info(f"video save_path is {save_path}")
+    vid_writer = cv2.VideoWriter(
+        save_path, cv2.VideoWriter_fourcc(*"mp4v"), fps, (int(width), int(height))
+    )
+    tracker = OCSort(det_thresh=args.track_thresh, iou_threshold=args.iou_thresh)
+    timer = Timer()
+    frame_id = 0
+    results = []
+    while True:
+        if frame_id % 20 == 0:
+            logger.info('Processing frame {} ({:.2f} fps)'.format(frame_id, 1. / max(1e-5, timer.average_time)))
+        ret_val, frame = cap.read()
+        if ret_val:
+            outputs, img_info = predictor.inference(frame, timer)
+            online_targets = tracker.update(outputs, [img_info['height'], img_info['width']], [img_info['height'], img_info['width']])
+            online_tlwhs = []
+            online_ids = []
+            # online_scores = []
+            for t in online_targets:
+                tlwh = [t[0], t[1], t[2] - t[0], t[3] - t[1]]
+                tid = t[4]
+                vertical = tlwh[2] / tlwh[3] > 1.6
+                if tlwh[2] * tlwh[3] > args.min_box_area and not vertical:
+                    online_tlwhs.append(tlwh)
+                    online_ids.append(tid)
+                    # online_scores.append(t.score)
+            timer.toc()
+            results.append((frame_id + 1, online_tlwhs, online_ids))
+            online_im = plot_tracking(img_info['raw_img'], online_tlwhs, online_ids, frame_id=frame_id + 1,
+                                      fps=1. / timer.average_time)
+            vid_writer.write(online_im)
+            ch = cv2.waitKey(1)
+            if ch == 27 or ch == ord("q") or ch == ord("Q"):
+                break
+        else:
+            break
+        frame_id += 1
+if __name__ == '__main__':
+    args = make_parser().parse_args()
+    predictor = Predictor(args)
+    imageflow_demo(predictor, args)

deploy/TensorRT/cpp/CMakeLists.txt ADDED Viewed

	@@ -0,0 +1,39 @@

+cmake_minimum_required(VERSION 2.6)
+project(bytetrack)
+add_definitions(-std=c++11)
+option(CUDA_USE_STATIC_CUDA_RUNTIME OFF)
+set(CMAKE_CXX_STANDARD 11)
+set(CMAKE_BUILD_TYPE Debug)
+find_package(CUDA REQUIRED)
+include_directories(${PROJECT_SOURCE_DIR}/include)
+include_directories(/usr/local/include/eigen3)
+link_directories(${PROJECT_SOURCE_DIR}/include)
+# include and link dirs of cuda and tensorrt, you need adapt them if yours are different
+# cuda
+include_directories(/usr/local/cuda/include)
+link_directories(/usr/local/cuda/lib64)
+# cudnn
+include_directories(/data/cuda/cuda-10.2/cudnn/v8.0.4/include)
+link_directories(/data/cuda/cuda-10.2/cudnn/v8.0.4/lib64)
+# tensorrt
+include_directories(/opt/tiger/demo/TensorRT-7.2.3.4/include)
+link_directories(/opt/tiger/demo/TensorRT-7.2.3.4/lib)
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -Wall -Ofast -Wfatal-errors -D_MWAITXINTRIN_H_INCLUDED")
+find_package(OpenCV)
+include_directories(${OpenCV_INCLUDE_DIRS})
+file(GLOB My_Source_Files ${PROJECT_SOURCE_DIR}/src/*.cpp)
+add_executable(bytetrack ${My_Source_Files})
+target_link_libraries(bytetrack nvinfer)
+target_link_libraries(bytetrack cudart)
+target_link_libraries(bytetrack ${OpenCV_LIBS})
+add_definitions(-O2 -pthread)

deploy/TensorRT/cpp/README.md ADDED Viewed

	@@ -0,0 +1,58 @@

+# ByteTrack-TensorRT in C++
+## Installation
+Install opencv with ```sudo apt-get install libopencv-dev``` (we don't need a higher version of opencv like v3.3+).
+Install eigen-3.3.9 [[google]](https://drive.google.com/file/d/1rqO74CYCNrmRAg8Rra0JP3yZtJ-rfket/view?usp=sharing), [[baidu(code:ueq4)]](https://pan.baidu.com/s/15kEfCxpy-T7tz60msxxExg).
+```shell
+unzip eigen-3.3.9.zip
+cd eigen-3.3.9
+mkdir build
+cd build
+cmake ..
+sudo make install
+```
+## Prepare serialized engine file
+Follow the TensorRT Python demo to convert and save the serialized engine file.
+Check the 'model_trt.engine' file, which will be automatically saved at the YOLOX_output dir.
+## Build the demo
+You should set the TensorRT path and CUDA path in CMakeLists.txt.
+For bytetrack_s model, we set the input frame size 1088 x 608. For bytetrack_m, bytetrack_l, bytetrack_x models, we set the input frame size 1440 x 800. You can modify the INPUT_W and INPUT_H in src/bytetrack.cpp
+```c++
+static const int INPUT_W = 1088;
+static const int INPUT_H = 608;
+```
+You can first build the demo:
+```shell
+cd <ByteTrack_HOME>/deploy/TensorRT/cpp
+mkdir build
+cd build
+cmake ..
+make
+```
+Then you can run the demo with **200 FPS**:
+```shell
+./bytetrack ../../../../YOLOX_outputs/yolox_s_mix_det/model_trt.engine -i ../../../../videos/palace.mp4
+```
+(If you find the output video lose some frames, you can convert the input video by running:
+```shell
+cd <ByteTrack_HOME>
+python3 tools/convert_video.py
+```
+to generate an appropriate input video for TensorRT C++ demo. )

deploy/TensorRT/cpp/include/BYTETracker.h ADDED Viewed

	@@ -0,0 +1,49 @@

+#pragma once
+#include "STrack.h"
+struct Object
+{
+    cv::Rect_<float> rect;
+    int label;
+    float prob;
+};
+class BYTETracker
+{
+public:
+	BYTETracker(int frame_rate = 30, int track_buffer = 30);
+	~BYTETracker();
+	vector<STrack> update(const vector<Object>& objects);
+	Scalar get_color(int idx);
+private:
+	vector<STrack*> joint_stracks(vector<STrack*> &tlista, vector<STrack> &tlistb);
+	vector<STrack> joint_stracks(vector<STrack> &tlista, vector<STrack> &tlistb);
+	vector<STrack> sub_stracks(vector<STrack> &tlista, vector<STrack> &tlistb);
+	void remove_duplicate_stracks(vector<STrack> &resa, vector<STrack> &resb, vector<STrack> &stracksa, vector<STrack> &stracksb);
+	void linear_assignment(vector<vector<float> > &cost_matrix, int cost_matrix_size, int cost_matrix_size_size, float thresh,
+		vector<vector<int> > &matches, vector<int> &unmatched_a, vector<int> &unmatched_b);
+	vector<vector<float> > iou_distance(vector<STrack*> &atracks, vector<STrack> &btracks, int &dist_size, int &dist_size_size);
+	vector<vector<float> > iou_distance(vector<STrack> &atracks, vector<STrack> &btracks);
+	vector<vector<float> > ious(vector<vector<float> > &atlbrs, vector<vector<float> > &btlbrs);
+	double lapjv(const vector<vector<float> > &cost, vector<int> &rowsol, vector<int> &colsol,
+		bool extend_cost = false, float cost_limit = LONG_MAX, bool return_cost = true);
+private:
+	float track_thresh;
+	float high_thresh;
+	float match_thresh;
+	int frame_id;
+	int max_time_lost;
+	vector<STrack> tracked_stracks;
+	vector<STrack> lost_stracks;
+	vector<STrack> removed_stracks;
+	byte_kalman::KalmanFilter kalman_filter;
+};

deploy/TensorRT/cpp/include/STrack.h ADDED Viewed

	@@ -0,0 +1,50 @@

+#pragma once
+#include <opencv2/opencv.hpp>
+#include "kalmanFilter.h"
+using namespace cv;
+using namespace std;
+enum TrackState { New = 0, Tracked, Lost, Removed };
+class STrack
+{
+public:
+	STrack(vector<float> tlwh_, float score);
+	~STrack();
+	vector<float> static tlbr_to_tlwh(vector<float> &tlbr);
+	void static multi_predict(vector<STrack*> &stracks, byte_kalman::KalmanFilter &kalman_filter);
+	void static_tlwh();
+	void static_tlbr();
+	vector<float> tlwh_to_xyah(vector<float> tlwh_tmp);
+	vector<float> to_xyah();
+	void mark_lost();
+	void mark_removed();
+	int next_id();
+	int end_frame();
+	void activate(byte_kalman::KalmanFilter &kalman_filter, int frame_id);
+	void re_activate(STrack &new_track, int frame_id, bool new_id = false);
+	void update(STrack &new_track, int frame_id);
+public:
+	bool is_activated;
+	int track_id;
+	int state;
+	vector<float> _tlwh;
+	vector<float> tlwh;
+	vector<float> tlbr;
+	int frame_id;
+	int tracklet_len;
+	int start_frame;
+	KAL_MEAN mean;
+	KAL_COVA covariance;
+	float score;
+private:
+	byte_kalman::KalmanFilter kalman_filter;
+};

deploy/TensorRT/cpp/include/dataType.h ADDED Viewed

	@@ -0,0 +1,36 @@

+#pragma once
+#include <cstddef>
+#include <vector>
+#include <Eigen/Core>
+#include <Eigen/Dense>
+typedef Eigen::Matrix<float, 1, 4, Eigen::RowMajor> DETECTBOX;
+typedef Eigen::Matrix<float, -1, 4, Eigen::RowMajor> DETECTBOXSS;
+typedef Eigen::Matrix<float, 1, 128, Eigen::RowMajor> FEATURE;
+typedef Eigen::Matrix<float, Eigen::Dynamic, 128, Eigen::RowMajor> FEATURESS;
+//typedef std::vector<FEATURE> FEATURESS;
+//Kalmanfilter
+//typedef Eigen::Matrix<float, 8, 8, Eigen::RowMajor> KAL_FILTER;
+typedef Eigen::Matrix<float, 1, 8, Eigen::RowMajor> KAL_MEAN;
+typedef Eigen::Matrix<float, 8, 8, Eigen::RowMajor> KAL_COVA;
+typedef Eigen::Matrix<float, 1, 4, Eigen::RowMajor> KAL_HMEAN;
+typedef Eigen::Matrix<float, 4, 4, Eigen::RowMajor> KAL_HCOVA;
+using KAL_DATA = std::pair<KAL_MEAN, KAL_COVA>;
+using KAL_HDATA = std::pair<KAL_HMEAN, KAL_HCOVA>;
+//main
+using RESULT_DATA = std::pair<int, DETECTBOX>;
+//tracker:
+using TRACKER_DATA = std::pair<int, FEATURESS>;
+using MATCH_DATA = std::pair<int, int>;
+typedef struct t {
+	std::vector<MATCH_DATA> matches;
+	std::vector<int> unmatched_tracks;
+	std::vector<int> unmatched_detections;
+}TRACHER_MATCHD;
+//linear_assignment:
+typedef Eigen::Matrix<float, -1, -1, Eigen::RowMajor> DYNAMICM;

deploy/TensorRT/cpp/include/kalmanFilter.h ADDED Viewed

	@@ -0,0 +1,31 @@

+#pragma once
+#include "dataType.h"
+namespace byte_kalman
+{
+	class KalmanFilter
+	{
+	public:
+		static const double chi2inv95[10];
+		KalmanFilter();
+		KAL_DATA initiate(const DETECTBOX& measurement);
+		void predict(KAL_MEAN& mean, KAL_COVA& covariance);
+		KAL_HDATA project(const KAL_MEAN& mean, const KAL_COVA& covariance);
+		KAL_DATA update(const KAL_MEAN& mean,
+			const KAL_COVA& covariance,
+			const DETECTBOX& measurement);
+		Eigen::Matrix<float, 1, -1> gating_distance(
+			const KAL_MEAN& mean,
+			const KAL_COVA& covariance,
+			const std::vector<DETECTBOX>& measurements,
+			bool only_position = false);
+	private:
+		Eigen::Matrix<float, 8, 8, Eigen::RowMajor> _motion_mat;
+		Eigen::Matrix<float, 4, 8, Eigen::RowMajor> _update_mat;
+		float _std_weight_position;
+		float _std_weight_velocity;
+	};
+}

deploy/TensorRT/cpp/include/lapjv.h ADDED Viewed

	@@ -0,0 +1,63 @@

+#ifndef LAPJV_H
+#define LAPJV_H
+#define LARGE 1000000
+#if !defined TRUE
+#define TRUE 1
+#endif
+#if !defined FALSE
+#define FALSE 0
+#endif
+#define NEW(x, t, n) if ((x = (t *)malloc(sizeof(t) * (n))) == 0) { return -1; }
+#define FREE(x) if (x != 0) { free(x); x = 0; }
+#define SWAP_INDICES(a, b) { int_t _temp_index = a; a = b; b = _temp_index; }
+#if 0
+#include <assert.h>
+#define ASSERT(cond) assert(cond)
+#define PRINTF(fmt, ...) printf(fmt, ##__VA_ARGS__)
+#define PRINT_COST_ARRAY(a, n) \
+    while (1) { \
+        printf(#a" = ["); \
+        if ((n) > 0) { \
+            printf("%f", (a)[0]); \
+            for (uint_t j = 1; j < n; j++) { \
+                printf(", %f", (a)[j]); \
+            } \
+        } \
+        printf("]\n"); \
+        break; \
+    }
+#define PRINT_INDEX_ARRAY(a, n) \
+    while (1) { \
+        printf(#a" = ["); \
+        if ((n) > 0) { \
+            printf("%d", (a)[0]); \
+            for (uint_t j = 1; j < n; j++) { \
+                printf(", %d", (a)[j]); \
+            } \
+        } \
+        printf("]\n"); \
+        break; \
+    }
+#else
+#define ASSERT(cond)
+#define PRINTF(fmt, ...)
+#define PRINT_COST_ARRAY(a, n)
+#define PRINT_INDEX_ARRAY(a, n)
+#endif
+typedef signed int int_t;
+typedef unsigned int uint_t;
+typedef double cost_t;
+typedef char boolean;
+typedef enum fp_t { FP_1 = 1, FP_2 = 2, FP_DYNAMIC = 3 } fp_t;
+extern int_t lapjv_internal(
+	const uint_t n, cost_t *cost[],
+	int_t *x, int_t *y);
+#endif // LAPJV_H

deploy/TensorRT/cpp/include/logging.h ADDED Viewed

	@@ -0,0 +1,503 @@

+/*
+ * Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#ifndef TENSORRT_LOGGING_H
+#define TENSORRT_LOGGING_H
+#include "NvInferRuntimeCommon.h"
+#include <cassert>
+#include <ctime>
+#include <iomanip>
+#include <iostream>
+#include <ostream>
+#include <sstream>
+#include <string>
+using Severity = nvinfer1::ILogger::Severity;
+class LogStreamConsumerBuffer : public std::stringbuf
+{
+public:
+    LogStreamConsumerBuffer(std::ostream& stream, const std::string& prefix, bool shouldLog)
+        : mOutput(stream)
+        , mPrefix(prefix)
+        , mShouldLog(shouldLog)
+    {
+    }
+    LogStreamConsumerBuffer(LogStreamConsumerBuffer&& other)
+        : mOutput(other.mOutput)
+    {
+    }
+    ~LogStreamConsumerBuffer()
+    {
+        // std::streambuf::pbase() gives a pointer to the beginning of the buffered part of the output sequence
+        // std::streambuf::pptr() gives a pointer to the current position of the output sequence
+        // if the pointer to the beginning is not equal to the pointer to the current position,
+        // call putOutput() to log the output to the stream
+        if (pbase() != pptr())
+        {
+            putOutput();
+        }
+    }
+    // synchronizes the stream buffer and returns 0 on success
+    // synchronizing the stream buffer consists of inserting the buffer contents into the stream,
+    // resetting the buffer and flushing the stream
+    virtual int sync()
+    {
+        putOutput();
+        return 0;
+    }
+    void putOutput()
+    {
+        if (mShouldLog)
+        {
+            // prepend timestamp
+            std::time_t timestamp = std::time(nullptr);
+            tm* tm_local = std::localtime(&timestamp);
+            std::cout << "[";
+            std::cout << std::setw(2) << std::setfill('0') << 1 + tm_local->tm_mon << "/";
+            std::cout << std::setw(2) << std::setfill('0') << tm_local->tm_mday << "/";
+            std::cout << std::setw(4) << std::setfill('0') << 1900 + tm_local->tm_year << "-";
+            std::cout << std::setw(2) << std::setfill('0') << tm_local->tm_hour << ":";
+            std::cout << std::setw(2) << std::setfill('0') << tm_local->tm_min << ":";
+            std::cout << std::setw(2) << std::setfill('0') << tm_local->tm_sec << "] ";
+            // std::stringbuf::str() gets the string contents of the buffer
+            // insert the buffer contents pre-appended by the appropriate prefix into the stream
+            mOutput << mPrefix << str();
+            // set the buffer to empty
+            str("");
+            // flush the stream
+            mOutput.flush();
+        }
+    }
+    void setShouldLog(bool shouldLog)
+    {
+        mShouldLog = shouldLog;
+    }
+private:
+    std::ostream& mOutput;
+    std::string mPrefix;
+    bool mShouldLog;
+};
+//!
+//! \class LogStreamConsumerBase
+//! \brief Convenience object used to initialize LogStreamConsumerBuffer before std::ostream in LogStreamConsumer
+//!
+class LogStreamConsumerBase
+{
+public:
+    LogStreamConsumerBase(std::ostream& stream, const std::string& prefix, bool shouldLog)
+        : mBuffer(stream, prefix, shouldLog)
+    {
+    }
+protected:
+    LogStreamConsumerBuffer mBuffer;
+};
+//!
+//! \class LogStreamConsumer
+//! \brief Convenience object used to facilitate use of C++ stream syntax when logging messages.
+//!  Order of base classes is LogStreamConsumerBase and then std::ostream.
+//!  This is because the LogStreamConsumerBase class is used to initialize the LogStreamConsumerBuffer member field
+//!  in LogStreamConsumer and then the address of the buffer is passed to std::ostream.
+//!  This is necessary to prevent the address of an uninitialized buffer from being passed to std::ostream.
+//!  Please do not change the order of the parent classes.
+//!
+class LogStreamConsumer : protected LogStreamConsumerBase, public std::ostream
+{
+public:
+    //! \brief Creates a LogStreamConsumer which logs messages with level severity.
+    //!  Reportable severity determines if the messages are severe enough to be logged.
+    LogStreamConsumer(Severity reportableSeverity, Severity severity)
+        : LogStreamConsumerBase(severityOstream(severity), severityPrefix(severity), severity <= reportableSeverity)
+        , std::ostream(&mBuffer) // links the stream buffer with the stream
+        , mShouldLog(severity <= reportableSeverity)
+        , mSeverity(severity)
+    {
+    }
+    LogStreamConsumer(LogStreamConsumer&& other)
+        : LogStreamConsumerBase(severityOstream(other.mSeverity), severityPrefix(other.mSeverity), other.mShouldLog)
+        , std::ostream(&mBuffer) // links the stream buffer with the stream
+        , mShouldLog(other.mShouldLog)
+        , mSeverity(other.mSeverity)
+    {
+    }
+    void setReportableSeverity(Severity reportableSeverity)
+    {
+        mShouldLog = mSeverity <= reportableSeverity;
+        mBuffer.setShouldLog(mShouldLog);
+    }
+private:
+    static std::ostream& severityOstream(Severity severity)
+    {
+        return severity >= Severity::kINFO ? std::cout : std::cerr;
+    }
+    static std::string severityPrefix(Severity severity)
+    {
+        switch (severity)
+        {
+        case Severity::kINTERNAL_ERROR: return "[F] ";
+        case Severity::kERROR: return "[E] ";
+        case Severity::kWARNING: return "[W] ";
+        case Severity::kINFO: return "[I] ";
+        case Severity::kVERBOSE: return "[V] ";
+        default: assert(0); return "";
+        }
+    }
+    bool mShouldLog;
+    Severity mSeverity;
+};
+//! \class Logger
+//!
+//! \brief Class which manages logging of TensorRT tools and samples
+//!
+//! \details This class provides a common interface for TensorRT tools and samples to log information to the console,
+//! and supports logging two types of messages:
+//!
+//! - Debugging messages with an associated severity (info, warning, error, or internal error/fatal)
+//! - Test pass/fail messages
+//!
+//! The advantage of having all samples use this class for logging as opposed to emitting directly to stdout/stderr is
+//! that the logic for controlling the verbosity and formatting of sample output is centralized in one location.
+//!
+//! In the future, this class could be extended to support dumping test results to a file in some standard format
+//! (for example, JUnit XML), and providing additional metadata (e.g. timing the duration of a test run).
+//!
+//! TODO: For backwards compatibility with existing samples, this class inherits directly from the nvinfer1::ILogger
+//! interface, which is problematic since there isn't a clean separation between messages coming from the TensorRT
+//! library and messages coming from the sample.
+//!
+//! In the future (once all samples are updated to use Logger::getTRTLogger() to access the ILogger) we can refactor the
+//! class to eliminate the inheritance and instead make the nvinfer1::ILogger implementation a member of the Logger
+//! object.
+class Logger : public nvinfer1::ILogger
+{
+public:
+    Logger(Severity severity = Severity::kWARNING)
+        : mReportableSeverity(severity)
+    {
+    }
+    //!
+    //! \enum TestResult
+    //! \brief Represents the state of a given test
+    //!
+    enum class TestResult
+    {
+        kRUNNING, //!< The test is running
+        kPASSED,  //!< The test passed
+        kFAILED,  //!< The test failed
+        kWAIVED   //!< The test was waived
+    };
+    //!
+    //! \brief Forward-compatible method for retrieving the nvinfer::ILogger associated with this Logger
+    //! \return The nvinfer1::ILogger associated with this Logger
+    //!
+    //! TODO Once all samples are updated to use this method to register the logger with TensorRT,
+    //! we can eliminate the inheritance of Logger from ILogger
+    //!
+    nvinfer1::ILogger& getTRTLogger()
+    {
+        return *this;
+    }
+    //!
+    //! \brief Implementation of the nvinfer1::ILogger::log() virtual method
+    //!
+    //! Note samples should not be calling this function directly; it will eventually go away once we eliminate the
+    //! inheritance from nvinfer1::ILogger
+    //!
+    void log(Severity severity, const char* msg) noexcept override
+    {
+        LogStreamConsumer(mReportableSeverity, severity) << "[TRT] " << std::string(msg) << std::endl;
+    }
+    //!
+    //! \brief Method for controlling the verbosity of logging output
+    //!
+    //! \param severity The logger will only emit messages that have severity of this level or higher.
+    //!
+    void setReportableSeverity(Severity severity)
+    {
+        mReportableSeverity = severity;
+    }
+    //!
+    //! \brief Opaque handle that holds logging information for a particular test
+    //!
+    //! This object is an opaque handle to information used by the Logger to print test results.
+    //! The sample must call Logger::defineTest() in order to obtain a TestAtom that can be used
+    //! with Logger::reportTest{Start,End}().
+    //!
+    class TestAtom
+    {
+    public:
+        TestAtom(TestAtom&&) = default;
+    private:
+        friend class Logger;
+        TestAtom(bool started, const std::string& name, const std::string& cmdline)
+            : mStarted(started)
+            , mName(name)
+            , mCmdline(cmdline)
+        {
+        }
+        bool mStarted;
+        std::string mName;
+        std::string mCmdline;
+    };
+    //!
+    //! \brief Define a test for logging
+    //!
+    //! \param[in] name The name of the test.  This should be a string starting with
+    //!                  "TensorRT" and containing dot-separated strings containing
+    //!                  the characters [A-Za-z0-9_].
+    //!                  For example, "TensorRT.sample_googlenet"
+    //! \param[in] cmdline The command line used to reproduce the test
+    //
+    //! \return a TestAtom that can be used in Logger::reportTest{Start,End}().
+    //!
+    static TestAtom defineTest(const std::string& name, const std::string& cmdline)
+    {
+        return TestAtom(false, name, cmdline);
+    }
+    //!
+    //! \brief A convenience overloaded version of defineTest() that accepts an array of command-line arguments
+    //!        as input
+    //!
+    //! \param[in] name The name of the test
+    //! \param[in] argc The number of command-line arguments
+    //! \param[in] argv The array of command-line arguments (given as C strings)
+    //!
+    //! \return a TestAtom that can be used in Logger::reportTest{Start,End}().
+    static TestAtom defineTest(const std::string& name, int argc, char const* const* argv)
+    {
+        auto cmdline = genCmdlineString(argc, argv);
+        return defineTest(name, cmdline);
+    }
+    //!
+    //! \brief Report that a test has started.
+    //!
+    //! \pre reportTestStart() has not been called yet for the given testAtom
+    //!
+    //! \param[in] testAtom The handle to the test that has started
+    //!
+    static void reportTestStart(TestAtom& testAtom)
+    {
+        reportTestResult(testAtom, TestResult::kRUNNING);
+        assert(!testAtom.mStarted);
+        testAtom.mStarted = true;
+    }
+    //!
+    //! \brief Report that a test has ended.
+    //!
+    //! \pre reportTestStart() has been called for the given testAtom
+    //!
+    //! \param[in] testAtom The handle to the test that has ended
+    //! \param[in] result The result of the test. Should be one of TestResult::kPASSED,
+    //!                   TestResult::kFAILED, TestResult::kWAIVED
+    //!
+    static void reportTestEnd(const TestAtom& testAtom, TestResult result)
+    {
+        assert(result != TestResult::kRUNNING);
+        assert(testAtom.mStarted);
+        reportTestResult(testAtom, result);
+    }
+    static int reportPass(const TestAtom& testAtom)
+    {
+        reportTestEnd(testAtom, TestResult::kPASSED);
+        return EXIT_SUCCESS;
+    }
+    static int reportFail(const TestAtom& testAtom)
+    {
+        reportTestEnd(testAtom, TestResult::kFAILED);
+        return EXIT_FAILURE;
+    }
+    static int reportWaive(const TestAtom& testAtom)
+    {
+        reportTestEnd(testAtom, TestResult::kWAIVED);
+        return EXIT_SUCCESS;
+    }
+    static int reportTest(const TestAtom& testAtom, bool pass)
+    {
+        return pass ? reportPass(testAtom) : reportFail(testAtom);
+    }
+    Severity getReportableSeverity() const
+    {
+        return mReportableSeverity;
+    }
+private:
+    //!
+    //! \brief returns an appropriate string for prefixing a log message with the given severity
+    //!
+    static const char* severityPrefix(Severity severity)
+    {
+        switch (severity)
+        {
+        case Severity::kINTERNAL_ERROR: return "[F] ";
+        case Severity::kERROR: return "[E] ";
+        case Severity::kWARNING: return "[W] ";
+        case Severity::kINFO: return "[I] ";
+        case Severity::kVERBOSE: return "[V] ";
+        default: assert(0); return "";
+        }
+    }
+    //!
+    //! \brief returns an appropriate string for prefixing a test result message with the given result
+    //!
+    static const char* testResultString(TestResult result)
+    {
+        switch (result)
+        {
+        case TestResult::kRUNNING: return "RUNNING";
+        case TestResult::kPASSED: return "PASSED";
+        case TestResult::kFAILED: return "FAILED";
+        case TestResult::kWAIVED: return "WAIVED";
+        default: assert(0); return "";
+        }
+    }
+    //!
+    //! \brief returns an appropriate output stream (cout or cerr) to use with the given severity
+    //!
+    static std::ostream& severityOstream(Severity severity)
+    {
+        return severity >= Severity::kINFO ? std::cout : std::cerr;
+    }
+    //!
+    //! \brief method that implements logging test results
+    //!
+    static void reportTestResult(const TestAtom& testAtom, TestResult result)
+    {
+        severityOstream(Severity::kINFO) << "&&&& " << testResultString(result) << " " << testAtom.mName << " # "
+                                         << testAtom.mCmdline << std::endl;
+    }
+    //!
+    //! \brief generate a command line string from the given (argc, argv) values
+    //!
+    static std::string genCmdlineString(int argc, char const* const* argv)
+    {
+        std::stringstream ss;
+        for (int i = 0; i < argc; i++)
+        {
+            if (i > 0)
+                ss << " ";
+            ss << argv[i];
+        }
+        return ss.str();
+    }
+    Severity mReportableSeverity;
+};
+namespace
+{
+//!
+//! \brief produces a LogStreamConsumer object that can be used to log messages of severity kVERBOSE
+//!
+//! Example usage:
+//!
+//!     LOG_VERBOSE(logger) << "hello world" << std::endl;
+//!
+inline LogStreamConsumer LOG_VERBOSE(const Logger& logger)
+{
+    return LogStreamConsumer(logger.getReportableSeverity(), Severity::kVERBOSE);
+}
+//!
+//! \brief produces a LogStreamConsumer object that can be used to log messages of severity kINFO
+//!
+//! Example usage:
+//!
+//!     LOG_INFO(logger) << "hello world" << std::endl;
+//!
+inline LogStreamConsumer LOG_INFO(const Logger& logger)
+{
+    return LogStreamConsumer(logger.getReportableSeverity(), Severity::kINFO);
+}
+//!
+//! \brief produces a LogStreamConsumer object that can be used to log messages of severity kWARNING
+//!
+//! Example usage:
+//!
+//!     LOG_WARN(logger) << "hello world" << std::endl;
+//!
+inline LogStreamConsumer LOG_WARN(const Logger& logger)
+{
+    return LogStreamConsumer(logger.getReportableSeverity(), Severity::kWARNING);
+}
+//!
+//! \brief produces a LogStreamConsumer object that can be used to log messages of severity kERROR
+//!
+//! Example usage:
+//!
+//!     LOG_ERROR(logger) << "hello world" << std::endl;
+//!
+inline LogStreamConsumer LOG_ERROR(const Logger& logger)
+{
+    return LogStreamConsumer(logger.getReportableSeverity(), Severity::kERROR);
+}
+//!
+//! \brief produces a LogStreamConsumer object that can be used to log messages of severity kINTERNAL_ERROR
+//         ("fatal" severity)
+//!
+//! Example usage:
+//!
+//!     LOG_FATAL(logger) << "hello world" << std::endl;
+//!
+inline LogStreamConsumer LOG_FATAL(const Logger& logger)
+{
+    return LogStreamConsumer(logger.getReportableSeverity(), Severity::kINTERNAL_ERROR);
+}
+} // anonymous namespace
+#endif // TENSORRT_LOGGING_H

deploy/TensorRT/cpp/src/BYTETracker.cpp ADDED Viewed

	@@ -0,0 +1,241 @@

+#include "BYTETracker.h"
+#include <fstream>
+BYTETracker::BYTETracker(int frame_rate, int track_buffer)
+{
+	track_thresh = 0.5;
+	high_thresh = 0.6;
+	match_thresh = 0.8;
+	frame_id = 0;
+	max_time_lost = int(frame_rate / 30.0 * track_buffer);
+	cout << "Init ByteTrack!" << endl;
+}
+BYTETracker::~BYTETracker()
+{
+}
+vector<STrack> BYTETracker::update(const vector<Object>& objects)
+{
+	////////////////// Step 1: Get detections //////////////////
+	this->frame_id++;
+	vector<STrack> activated_stracks;
+	vector<STrack> refind_stracks;
+	vector<STrack> removed_stracks;
+	vector<STrack> lost_stracks;
+	vector<STrack> detections;
+	vector<STrack> detections_low;
+	vector<STrack> detections_cp;
+	vector<STrack> tracked_stracks_swap;
+	vector<STrack> resa, resb;
+	vector<STrack> output_stracks;
+	vector<STrack*> unconfirmed;
+	vector<STrack*> tracked_stracks;
+	vector<STrack*> strack_pool;
+	vector<STrack*> r_tracked_stracks;
+	if (objects.size() > 0)
+	{
+		for (int i = 0; i < objects.size(); i++)
+		{
+			vector<float> tlbr_;
+			tlbr_.resize(4);
+			tlbr_[0] = objects[i].rect.x;
+			tlbr_[1] = objects[i].rect.y;
+			tlbr_[2] = objects[i].rect.x + objects[i].rect.width;
+			tlbr_[3] = objects[i].rect.y + objects[i].rect.height;
+			float score = objects[i].prob;
+			STrack strack(STrack::tlbr_to_tlwh(tlbr_), score);
+			if (score >= track_thresh)
+			{
+				detections.push_back(strack);
+			}
+			else
+			{
+				detections_low.push_back(strack);
+			}
+		}
+	}
+	// Add newly detected tracklets to tracked_stracks
+	for (int i = 0; i < this->tracked_stracks.size(); i++)
+	{
+		if (!this->tracked_stracks[i].is_activated)
+			unconfirmed.push_back(&this->tracked_stracks[i]);
+		else
+			tracked_stracks.push_back(&this->tracked_stracks[i]);
+	}
+	////////////////// Step 2: First association, with IoU //////////////////
+	strack_pool = joint_stracks(tracked_stracks, this->lost_stracks);
+	STrack::multi_predict(strack_pool, this->kalman_filter);
+	vector<vector<float> > dists;
+	int dist_size = 0, dist_size_size = 0;
+	dists = iou_distance(strack_pool, detections, dist_size, dist_size_size);
+	vector<vector<int> > matches;
+	vector<int> u_track, u_detection;
+	linear_assignment(dists, dist_size, dist_size_size, match_thresh, matches, u_track, u_detection);
+	for (int i = 0; i < matches.size(); i++)
+	{
+		STrack *track = strack_pool[matches[i][0]];
+		STrack *det = &detections[matches[i][1]];
+		if (track->state == TrackState::Tracked)
+		{
+			track->update(*det, this->frame_id);
+			activated_stracks.push_back(*track);
+		}
+		else
+		{
+			track->re_activate(*det, this->frame_id, false);
+			refind_stracks.push_back(*track);
+		}
+	}
+	////////////////// Step 3: Second association, using low score dets //////////////////
+	for (int i = 0; i < u_detection.size(); i++)
+	{
+		detections_cp.push_back(detections[u_detection[i]]);
+	}
+	detections.clear();
+	detections.assign(detections_low.begin(), detections_low.end());
+	for (int i = 0; i < u_track.size(); i++)
+	{
+		if (strack_pool[u_track[i]]->state == TrackState::Tracked)
+		{
+			r_tracked_stracks.push_back(strack_pool[u_track[i]]);
+		}
+	}
+	dists.clear();
+	dists = iou_distance(r_tracked_stracks, detections, dist_size, dist_size_size);
+	matches.clear();
+	u_track.clear();
+	u_detection.clear();
+	linear_assignment(dists, dist_size, dist_size_size, 0.5, matches, u_track, u_detection);
+	for (int i = 0; i < matches.size(); i++)
+	{
+		STrack *track = r_tracked_stracks[matches[i][0]];
+		STrack *det = &detections[matches[i][1]];
+		if (track->state == TrackState::Tracked)
+		{
+			track->update(*det, this->frame_id);
+			activated_stracks.push_back(*track);
+		}
+		else
+		{
+			track->re_activate(*det, this->frame_id, false);
+			refind_stracks.push_back(*track);
+		}
+	}
+	for (int i = 0; i < u_track.size(); i++)
+	{
+		STrack *track = r_tracked_stracks[u_track[i]];
+		if (track->state != TrackState::Lost)
+		{
+			track->mark_lost();
+			lost_stracks.push_back(*track);
+		}
+	}
+	// Deal with unconfirmed tracks, usually tracks with only one beginning frame
+	detections.clear();
+	detections.assign(detections_cp.begin(), detections_cp.end());
+	dists.clear();
+	dists = iou_distance(unconfirmed, detections, dist_size, dist_size_size);
+	matches.clear();
+	vector<int> u_unconfirmed;
+	u_detection.clear();
+	linear_assignment(dists, dist_size, dist_size_size, 0.7, matches, u_unconfirmed, u_detection);
+	for (int i = 0; i < matches.size(); i++)
+	{
+		unconfirmed[matches[i][0]]->update(detections[matches[i][1]], this->frame_id);
+		activated_stracks.push_back(*unconfirmed[matches[i][0]]);
+	}
+	for (int i = 0; i < u_unconfirmed.size(); i++)
+	{
+		STrack *track = unconfirmed[u_unconfirmed[i]];
+		track->mark_removed();
+		removed_stracks.push_back(*track);
+	}
+	////////////////// Step 4: Init new stracks //////////////////
+	for (int i = 0; i < u_detection.size(); i++)
+	{
+		STrack *track = &detections[u_detection[i]];
+		if (track->score < this->high_thresh)
+			continue;
+		track->activate(this->kalman_filter, this->frame_id);
+		activated_stracks.push_back(*track);
+	}
+	////////////////// Step 5: Update state //////////////////
+	for (int i = 0; i < this->lost_stracks.size(); i++)
+	{
+		if (this->frame_id - this->lost_stracks[i].end_frame() > this->max_time_lost)
+		{
+			this->lost_stracks[i].mark_removed();
+			removed_stracks.push_back(this->lost_stracks[i]);
+		}
+	}
+	for (int i = 0; i < this->tracked_stracks.size(); i++)
+	{
+		if (this->tracked_stracks[i].state == TrackState::Tracked)
+		{
+			tracked_stracks_swap.push_back(this->tracked_stracks[i]);
+		}
+	}
+	this->tracked_stracks.clear();
+	this->tracked_stracks.assign(tracked_stracks_swap.begin(), tracked_stracks_swap.end());
+	this->tracked_stracks = joint_stracks(this->tracked_stracks, activated_stracks);
+	this->tracked_stracks = joint_stracks(this->tracked_stracks, refind_stracks);
+	//std::cout << activated_stracks.size() << std::endl;
+	this->lost_stracks = sub_stracks(this->lost_stracks, this->tracked_stracks);
+	for (int i = 0; i < lost_stracks.size(); i++)
+	{
+		this->lost_stracks.push_back(lost_stracks[i]);
+	}
+	this->lost_stracks = sub_stracks(this->lost_stracks, this->removed_stracks);
+	for (int i = 0; i < removed_stracks.size(); i++)
+	{
+		this->removed_stracks.push_back(removed_stracks[i]);
+	}
+	remove_duplicate_stracks(resa, resb, this->tracked_stracks, this->lost_stracks);
+	this->tracked_stracks.clear();
+	this->tracked_stracks.assign(resa.begin(), resa.end());
+	this->lost_stracks.clear();
+	this->lost_stracks.assign(resb.begin(), resb.end());
+	for (int i = 0; i < this->tracked_stracks.size(); i++)
+	{
+		if (this->tracked_stracks[i].is_activated)
+		{
+			output_stracks.push_back(this->tracked_stracks[i]);
+		}
+	}
+	return output_stracks;
+}

deploy/TensorRT/cpp/src/STrack.cpp ADDED Viewed

	@@ -0,0 +1,192 @@

+#include "STrack.h"
+STrack::STrack(vector<float> tlwh_, float score)
+{
+	_tlwh.resize(4);
+	_tlwh.assign(tlwh_.begin(), tlwh_.end());
+	is_activated = false;
+	track_id = 0;
+	state = TrackState::New;
+	tlwh.resize(4);
+	tlbr.resize(4);
+	static_tlwh();
+	static_tlbr();
+	frame_id = 0;
+	tracklet_len = 0;
+	this->score = score;
+	start_frame = 0;
+}
+STrack::~STrack()
+{
+}
+void STrack::activate(byte_kalman::KalmanFilter &kalman_filter, int frame_id)
+{
+	this->kalman_filter = kalman_filter;
+	this->track_id = this->next_id();
+	vector<float> _tlwh_tmp(4);
+	_tlwh_tmp[0] = this->_tlwh[0];
+	_tlwh_tmp[1] = this->_tlwh[1];
+	_tlwh_tmp[2] = this->_tlwh[2];
+	_tlwh_tmp[3] = this->_tlwh[3];
+	vector<float> xyah = tlwh_to_xyah(_tlwh_tmp);
+	DETECTBOX xyah_box;
+	xyah_box[0] = xyah[0];
+	xyah_box[1] = xyah[1];
+	xyah_box[2] = xyah[2];
+	xyah_box[3] = xyah[3];
+	auto mc = this->kalman_filter.initiate(xyah_box);
+	this->mean = mc.first;
+	this->covariance = mc.second;
+	static_tlwh();
+	static_tlbr();
+	this->tracklet_len = 0;
+	this->state = TrackState::Tracked;
+	if (frame_id == 1)
+	{
+		this->is_activated = true;
+	}
+	//this->is_activated = true;
+	this->frame_id = frame_id;
+	this->start_frame = frame_id;
+}
+void STrack::re_activate(STrack &new_track, int frame_id, bool new_id)
+{
+	vector<float> xyah = tlwh_to_xyah(new_track.tlwh);
+	DETECTBOX xyah_box;
+	xyah_box[0] = xyah[0];
+	xyah_box[1] = xyah[1];
+	xyah_box[2] = xyah[2];
+	xyah_box[3] = xyah[3];
+	auto mc = this->kalman_filter.update(this->mean, this->covariance, xyah_box);
+	this->mean = mc.first;
+	this->covariance = mc.second;
+	static_tlwh();
+	static_tlbr();
+	this->tracklet_len = 0;
+	this->state = TrackState::Tracked;
+	this->is_activated = true;
+	this->frame_id = frame_id;
+	this->score = new_track.score;
+	if (new_id)
+		this->track_id = next_id();
+}
+void STrack::update(STrack &new_track, int frame_id)
+{
+	this->frame_id = frame_id;
+	this->tracklet_len++;
+	vector<float> xyah = tlwh_to_xyah(new_track.tlwh);
+	DETECTBOX xyah_box;
+	xyah_box[0] = xyah[0];
+	xyah_box[1] = xyah[1];
+	xyah_box[2] = xyah[2];
+	xyah_box[3] = xyah[3];
+	auto mc = this->kalman_filter.update(this->mean, this->covariance, xyah_box);
+	this->mean = mc.first;
+	this->covariance = mc.second;
+	static_tlwh();
+	static_tlbr();
+	this->state = TrackState::Tracked;
+	this->is_activated = true;
+	this->score = new_track.score;
+}
+void STrack::static_tlwh()
+{
+	if (this->state == TrackState::New)
+	{
+		tlwh[0] = _tlwh[0];
+		tlwh[1] = _tlwh[1];
+		tlwh[2] = _tlwh[2];
+		tlwh[3] = _tlwh[3];
+		return;
+	}
+	tlwh[0] = mean[0];
+	tlwh[1] = mean[1];
+	tlwh[2] = mean[2];
+	tlwh[3] = mean[3];
+	tlwh[2] *= tlwh[3];
+	tlwh[0] -= tlwh[2] / 2;
+	tlwh[1] -= tlwh[3] / 2;
+}
+void STrack::static_tlbr()
+{
+	tlbr.clear();
+	tlbr.assign(tlwh.begin(), tlwh.end());
+	tlbr[2] += tlbr[0];
+	tlbr[3] += tlbr[1];
+}
+vector<float> STrack::tlwh_to_xyah(vector<float> tlwh_tmp)
+{
+	vector<float> tlwh_output = tlwh_tmp;
+	tlwh_output[0] += tlwh_output[2] / 2;
+	tlwh_output[1] += tlwh_output[3] / 2;
+	tlwh_output[2] /= tlwh_output[3];
+	return tlwh_output;
+}
+vector<float> STrack::to_xyah()
+{
+	return tlwh_to_xyah(tlwh);
+}
+vector<float> STrack::tlbr_to_tlwh(vector<float> &tlbr)
+{
+	tlbr[2] -= tlbr[0];
+	tlbr[3] -= tlbr[1];
+	return tlbr;
+}
+void STrack::mark_lost()
+{
+	state = TrackState::Lost;
+}
+void STrack::mark_removed()
+{
+	state = TrackState::Removed;
+}
+int STrack::next_id()
+{
+	static int _count = 0;
+	_count++;
+	return _count;
+}
+int STrack::end_frame()
+{
+	return this->frame_id;
+}
+void STrack::multi_predict(vector<STrack*> &stracks, byte_kalman::KalmanFilter &kalman_filter)
+{
+	for (int i = 0; i < stracks.size(); i++)
+	{
+		if (stracks[i]->state != TrackState::Tracked)
+		{
+			stracks[i]->mean[7] = 0;
+		}
+		kalman_filter.predict(stracks[i]->mean, stracks[i]->covariance);
+	}
+}

deploy/TensorRT/cpp/src/bytetrack.cpp ADDED Viewed

	@@ -0,0 +1,505 @@

+#include <fstream>
+#include <iostream>
+#include <sstream>
+#include <numeric>
+#include <chrono>
+#include <vector>
+#include <opencv2/opencv.hpp>
+#include <dirent.h>
+#include "NvInfer.h"
+#include "cuda_runtime_api.h"
+#include "logging.h"
+#include "BYTETracker.h"
+#define CHECK(status) \
+    do\
+    {\
+        auto ret = (status);\
+        if (ret != 0)\
+        {\
+            cerr << "Cuda failure: " << ret << endl;\
+            abort();\
+        }\
+    } while (0)
+#define DEVICE 0  // GPU id
+#define NMS_THRESH 0.7
+#define BBOX_CONF_THRESH 0.1
+using namespace nvinfer1;
+// stuff we know about the network and the input/output blobs
+static const int INPUT_W = 1088;
+static const int INPUT_H = 608;
+const char* INPUT_BLOB_NAME = "input_0";
+const char* OUTPUT_BLOB_NAME = "output_0";
+static Logger gLogger;
+Mat static_resize(Mat& img) {
+    float r = min(INPUT_W / (img.cols*1.0), INPUT_H / (img.rows*1.0));
+    // r = std::min(r, 1.0f);
+    int unpad_w = r * img.cols;
+    int unpad_h = r * img.rows;
+    Mat re(unpad_h, unpad_w, CV_8UC3);
+    resize(img, re, re.size());
+    Mat out(INPUT_H, INPUT_W, CV_8UC3, Scalar(114, 114, 114));
+    re.copyTo(out(Rect(0, 0, re.cols, re.rows)));
+    return out;
+}
+struct GridAndStride
+{
+    int grid0;
+    int grid1;
+    int stride;
+};
+static void generate_grids_and_stride(const int target_w, const int target_h, vector<int>& strides, vector<GridAndStride>& grid_strides)
+{
+    for (auto stride : strides)
+    {
+        int num_grid_w = target_w / stride;
+        int num_grid_h = target_h / stride;
+        for (int g1 = 0; g1 < num_grid_h; g1++)
+        {
+            for (int g0 = 0; g0 < num_grid_w; g0++)
+            {
+                grid_strides.push_back((GridAndStride){g0, g1, stride});
+            }
+        }
+    }
+}
+static inline float intersection_area(const Object& a, const Object& b)
+{
+    Rect_<float> inter = a.rect & b.rect;
+    return inter.area();
+}
+static void qsort_descent_inplace(vector<Object>& faceobjects, int left, int right)
+{
+    int i = left;
+    int j = right;
+    float p = faceobjects[(left + right) / 2].prob;
+    while (i <= j)
+    {
+        while (faceobjects[i].prob > p)
+            i++;
+        while (faceobjects[j].prob < p)
+            j--;
+        if (i <= j)
+        {
+            // swap
+            swap(faceobjects[i], faceobjects[j]);
+            i++;
+            j--;
+        }
+    }
+    #pragma omp parallel sections
+    {
+        #pragma omp section
+        {
+            if (left < j) qsort_descent_inplace(faceobjects, left, j);
+        }
+        #pragma omp section
+        {
+            if (i < right) qsort_descent_inplace(faceobjects, i, right);
+        }
+    }
+}
+static void qsort_descent_inplace(vector<Object>& objects)
+{
+    if (objects.empty())
+        return;
+    qsort_descent_inplace(objects, 0, objects.size() - 1);
+}
+static void nms_sorted_bboxes(const vector<Object>& faceobjects, vector<int>& picked, float nms_threshold)
+{
+    picked.clear();
+    const int n = faceobjects.size();
+    vector<float> areas(n);
+    for (int i = 0; i < n; i++)
+    {
+        areas[i] = faceobjects[i].rect.area();
+    }
+    for (int i = 0; i < n; i++)
+    {
+        const Object& a = faceobjects[i];
+        int keep = 1;
+        for (int j = 0; j < (int)picked.size(); j++)
+        {
+            const Object& b = faceobjects[picked[j]];
+            // intersection over union
+            float inter_area = intersection_area(a, b);
+            float union_area = areas[i] + areas[picked[j]] - inter_area;
+            // float IoU = inter_area / union_area
+            if (inter_area / union_area > nms_threshold)
+                keep = 0;
+        }
+        if (keep)
+            picked.push_back(i);
+    }
+}
+static void generate_yolox_proposals(vector<GridAndStride> grid_strides, float* feat_blob, float prob_threshold, vector<Object>& objects)
+{
+    const int num_class = 1;
+    const int num_anchors = grid_strides.size();
+    for (int anchor_idx = 0; anchor_idx < num_anchors; anchor_idx++)
+    {
+        const int grid0 = grid_strides[anchor_idx].grid0;
+        const int grid1 = grid_strides[anchor_idx].grid1;
+        const int stride = grid_strides[anchor_idx].stride;
+        const int basic_pos = anchor_idx * (num_class + 5);
+        // yolox/models/yolo_head.py decode logic
+        float x_center = (feat_blob[basic_pos+0] + grid0) * stride;
+        float y_center = (feat_blob[basic_pos+1] + grid1) * stride;
+        float w = exp(feat_blob[basic_pos+2]) * stride;
+        float h = exp(feat_blob[basic_pos+3]) * stride;
+        float x0 = x_center - w * 0.5f;
+        float y0 = y_center - h * 0.5f;
+        float box_objectness = feat_blob[basic_pos+4];
+        for (int class_idx = 0; class_idx < num_class; class_idx++)
+        {
+            float box_cls_score = feat_blob[basic_pos + 5 + class_idx];
+            float box_prob = box_objectness * box_cls_score;
+            if (box_prob > prob_threshold)
+            {
+                Object obj;
+                obj.rect.x = x0;
+                obj.rect.y = y0;
+                obj.rect.width = w;
+                obj.rect.height = h;
+                obj.label = class_idx;
+                obj.prob = box_prob;
+                objects.push_back(obj);
+            }
+        } // class loop
+    } // point anchor loop
+}
+float* blobFromImage(Mat& img){
+    cvtColor(img, img, COLOR_BGR2RGB);
+    float* blob = new float[img.total()*3];
+    int channels = 3;
+    int img_h = img.rows;
+    int img_w = img.cols;
+    vector<float> mean = {0.485, 0.456, 0.406};
+    vector<float> std = {0.229, 0.224, 0.225};
+    for (size_t c = 0; c < channels; c++)
+    {
+        for (size_t  h = 0; h < img_h; h++)
+        {
+            for (size_t w = 0; w < img_w; w++)
+            {
+                blob[c * img_w * img_h + h * img_w + w] =
+                    (((float)img.at<Vec3b>(h, w)[c]) / 255.0f - mean[c]) / std[c];
+            }
+        }
+    }
+    return blob;
+}
+static void decode_outputs(float* prob, vector<Object>& objects, float scale, const int img_w, const int img_h) {
+        vector<Object> proposals;
+        vector<int> strides = {8, 16, 32};
+        vector<GridAndStride> grid_strides;
+        generate_grids_and_stride(INPUT_W, INPUT_H, strides, grid_strides);
+        generate_yolox_proposals(grid_strides, prob,  BBOX_CONF_THRESH, proposals);
+        //std::cout << "num of boxes before nms: " << proposals.size() << std::endl;
+        qsort_descent_inplace(proposals);
+        vector<int> picked;
+        nms_sorted_bboxes(proposals, picked, NMS_THRESH);
+        int count = picked.size();
+        //std::cout << "num of boxes: " << count << std::endl;
+        objects.resize(count);
+        for (int i = 0; i < count; i++)
+        {
+            objects[i] = proposals[picked[i]];
+            // adjust offset to original unpadded
+            float x0 = (objects[i].rect.x) / scale;
+            float y0 = (objects[i].rect.y) / scale;
+            float x1 = (objects[i].rect.x + objects[i].rect.width) / scale;
+            float y1 = (objects[i].rect.y + objects[i].rect.height) / scale;
+            // clip
+            // x0 = std::max(std::min(x0, (float)(img_w - 1)), 0.f);
+            // y0 = std::max(std::min(y0, (float)(img_h - 1)), 0.f);
+            // x1 = std::max(std::min(x1, (float)(img_w - 1)), 0.f);
+            // y1 = std::max(std::min(y1, (float)(img_h - 1)), 0.f);
+            objects[i].rect.x = x0;
+            objects[i].rect.y = y0;
+            objects[i].rect.width = x1 - x0;
+            objects[i].rect.height = y1 - y0;
+        }
+}
+const float color_list[80][3] =
+{
+    {0.000, 0.447, 0.741},
+    {0.850, 0.325, 0.098},
+    {0.929, 0.694, 0.125},
+    {0.494, 0.184, 0.556},
+    {0.466, 0.674, 0.188},
+    {0.301, 0.745, 0.933},
+    {0.635, 0.078, 0.184},
+    {0.300, 0.300, 0.300},
+    {0.600, 0.600, 0.600},
+    {1.000, 0.000, 0.000},
+    {1.000, 0.500, 0.000},
+    {0.749, 0.749, 0.000},
+    {0.000, 1.000, 0.000},
+    {0.000, 0.000, 1.000},
+    {0.667, 0.000, 1.000},
+    {0.333, 0.333, 0.000},
+    {0.333, 0.667, 0.000},
+    {0.333, 1.000, 0.000},
+    {0.667, 0.333, 0.000},
+    {0.667, 0.667, 0.000},
+    {0.667, 1.000, 0.000},
+    {1.000, 0.333, 0.000},
+    {1.000, 0.667, 0.000},
+    {1.000, 1.000, 0.000},
+    {0.000, 0.333, 0.500},
+    {0.000, 0.667, 0.500},
+    {0.000, 1.000, 0.500},
+    {0.333, 0.000, 0.500},
+    {0.333, 0.333, 0.500},
+    {0.333, 0.667, 0.500},
+    {0.333, 1.000, 0.500},
+    {0.667, 0.000, 0.500},
+    {0.667, 0.333, 0.500},
+    {0.667, 0.667, 0.500},
+    {0.667, 1.000, 0.500},
+    {1.000, 0.000, 0.500},
+    {1.000, 0.333, 0.500},
+    {1.000, 0.667, 0.500},
+    {1.000, 1.000, 0.500},
+    {0.000, 0.333, 1.000},
+    {0.000, 0.667, 1.000},
+    {0.000, 1.000, 1.000},
+    {0.333, 0.000, 1.000},
+    {0.333, 0.333, 1.000},
+    {0.333, 0.667, 1.000},
+    {0.333, 1.000, 1.000},
+    {0.667, 0.000, 1.000},
+    {0.667, 0.333, 1.000},
+    {0.667, 0.667, 1.000},
+    {0.667, 1.000, 1.000},
+    {1.000, 0.000, 1.000},
+    {1.000, 0.333, 1.000},
+    {1.000, 0.667, 1.000},
+    {0.333, 0.000, 0.000},
+    {0.500, 0.000, 0.000},
+    {0.667, 0.000, 0.000},
+    {0.833, 0.000, 0.000},
+    {1.000, 0.000, 0.000},
+    {0.000, 0.167, 0.000},
+    {0.000, 0.333, 0.000},
+    {0.000, 0.500, 0.000},
+    {0.000, 0.667, 0.000},
+    {0.000, 0.833, 0.000},
+    {0.000, 1.000, 0.000},
+    {0.000, 0.000, 0.167},
+    {0.000, 0.000, 0.333},
+    {0.000, 0.000, 0.500},
+    {0.000, 0.000, 0.667},
+    {0.000, 0.000, 0.833},
+    {0.000, 0.000, 1.000},
+    {0.000, 0.000, 0.000},
+    {0.143, 0.143, 0.143},
+    {0.286, 0.286, 0.286},
+    {0.429, 0.429, 0.429},
+    {0.571, 0.571, 0.571},
+    {0.714, 0.714, 0.714},
+    {0.857, 0.857, 0.857},
+    {0.000, 0.447, 0.741},
+    {0.314, 0.717, 0.741},
+    {0.50, 0.5, 0}
+};
+void doInference(IExecutionContext& context, float* input, float* output, const int output_size, Size input_shape) {
+    const ICudaEngine& engine = context.getEngine();
+    // Pointers to input and output device buffers to pass to engine.
+    // Engine requires exactly IEngine::getNbBindings() number of buffers.
+    assert(engine.getNbBindings() == 2);
+    void* buffers[2];
+    // In order to bind the buffers, we need to know the names of the input and output tensors.
+    // Note that indices are guaranteed to be less than IEngine::getNbBindings()
+    const int inputIndex = engine.getBindingIndex(INPUT_BLOB_NAME);
+    assert(engine.getBindingDataType(inputIndex) == nvinfer1::DataType::kFLOAT);
+    const int outputIndex = engine.getBindingIndex(OUTPUT_BLOB_NAME);
+    assert(engine.getBindingDataType(outputIndex) == nvinfer1::DataType::kFLOAT);
+    int mBatchSize = engine.getMaxBatchSize();
+    // Create GPU buffers on device
+    CHECK(cudaMalloc(&buffers[inputIndex], 3 * input_shape.height * input_shape.width * sizeof(float)));
+    CHECK(cudaMalloc(&buffers[outputIndex], output_size*sizeof(float)));
+    // Create stream
+    cudaStream_t stream;
+    CHECK(cudaStreamCreate(&stream));
+    // DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host
+    CHECK(cudaMemcpyAsync(buffers[inputIndex], input, 3 * input_shape.height * input_shape.width * sizeof(float), cudaMemcpyHostToDevice, stream));
+    context.enqueue(1, buffers, stream, nullptr);
+    CHECK(cudaMemcpyAsync(output, buffers[outputIndex], output_size * sizeof(float), cudaMemcpyDeviceToHost, stream));
+    cudaStreamSynchronize(stream);
+    // Release stream and buffers
+    cudaStreamDestroy(stream);
+    CHECK(cudaFree(buffers[inputIndex]));
+    CHECK(cudaFree(buffers[outputIndex]));
+}
+int main(int argc, char** argv) {
+    cudaSetDevice(DEVICE);
+    // create a model using the API directly and serialize it to a stream
+    char *trtModelStream{nullptr};
+    size_t size{0};
+    if (argc == 4 && string(argv[2]) == "-i") {
+        const string engine_file_path {argv[1]};
+        ifstream file(engine_file_path, ios::binary);
+        if (file.good()) {
+            file.seekg(0, file.end);
+            size = file.tellg();
+            file.seekg(0, file.beg);
+            trtModelStream = new char[size];
+            assert(trtModelStream);
+            file.read(trtModelStream, size);
+            file.close();
+        }
+    } else {
+        cerr << "arguments not right!" << endl;
+        cerr << "run 'python3 tools/trt.py -f exps/example/mot/yolox_s_mix_det.py -c pretrained/bytetrack_s_mot17.pth.tar' to serialize model first!" << std::endl;
+        cerr << "Then use the following command:" << endl;
+        cerr << "cd demo/TensorRT/cpp/build" << endl;
+        cerr << "./bytetrack ../../../../YOLOX_outputs/yolox_s_mix_det/model_trt.engine -i ../../../../videos/palace.mp4  // deserialize file and run inference" << std::endl;
+        return -1;
+    }
+    const string input_video_path {argv[3]};
+    IRuntime* runtime = createInferRuntime(gLogger);
+    assert(runtime != nullptr);
+    ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size);
+    assert(engine != nullptr);
+    IExecutionContext* context = engine->createExecutionContext();
+    assert(context != nullptr);
+    delete[] trtModelStream;
+    auto out_dims = engine->getBindingDimensions(1);
+    auto output_size = 1;
+    for(int j=0;j<out_dims.nbDims;j++) {
+        output_size *= out_dims.d[j];
+    }
+    static float* prob = new float[output_size];
+    VideoCapture cap(input_video_path);
+	if (!cap.isOpened())
+		return 0;
+	int img_w = cap.get(CAP_PROP_FRAME_WIDTH);
+	int img_h = cap.get(CAP_PROP_FRAME_HEIGHT);
+    int fps = cap.get(CAP_PROP_FPS);
+    long nFrame = static_cast<long>(cap.get(CAP_PROP_FRAME_COUNT));
+    cout << "Total frames: " << nFrame << endl;
+    VideoWriter writer("demo.mp4", VideoWriter::fourcc('m', 'p', '4', 'v'), fps, Size(img_w, img_h));
+    Mat img;
+    BYTETracker tracker(fps, 30);
+    int num_frames = 0;
+    int total_ms = 0;
+	while (true)
+    {
+        if(!cap.read(img))
+            break;
+        num_frames ++;
+        if (num_frames % 20 == 0)
+        {
+            cout << "Processing frame " << num_frames << " (" << num_frames * 1000000 / total_ms << " fps)" << endl;
+        }
+		if (img.empty())
+			break;
+        Mat pr_img = static_resize(img);
+        float* blob;
+        blob = blobFromImage(pr_img);
+        float scale = min(INPUT_W / (img.cols*1.0), INPUT_H / (img.rows*1.0));
+        // run inference
+        auto start = chrono::system_clock::now();
+        doInference(*context, blob, prob, output_size, pr_img.size());
+        vector<Object> objects;
+        decode_outputs(prob, objects, scale, img_w, img_h);
+        vector<STrack> output_stracks = tracker.update(objects);
+        auto end = chrono::system_clock::now();
+        total_ms = total_ms + chrono::duration_cast<chrono::microseconds>(end - start).count();
+        for (int i = 0; i < output_stracks.size(); i++)
+		{
+			vector<float> tlwh = output_stracks[i].tlwh;
+			if (tlwh[2] * tlwh[3] > 20)
+			{
+				Scalar s = tracker.get_color(output_stracks[i].track_id);
+				putText(img, format("%d", output_stracks[i].track_id), Point(tlwh[0], tlwh[1] - 5),
+                        0, 0.6, Scalar(0, 0, 255), 2, LINE_AA);
+                rectangle(img, Rect(tlwh[0], tlwh[1], tlwh[2], tlwh[3]), s, 2);
+			}
+		}
+        putText(img, format("frame: %d fps: %d num: %d", num_frames, num_frames * 1000000 / total_ms, output_stracks.size()),
+                Point(0, 30), 0, 0.6, Scalar(0, 0, 255), 2, LINE_AA);
+        writer.write(img);
+        delete blob;
+        char c = waitKey(1);
+        if (c > 0)
+        {
+            break;
+        }
+    }
+    cap.release();
+    cout << "FPS: " << num_frames * 1000000 / total_ms << endl;
+    // destroy the engine
+    context->destroy();
+    engine->destroy();
+    runtime->destroy();
+    return 0;
+}

deploy/TensorRT/cpp/src/kalmanFilter.cpp ADDED Viewed

	@@ -0,0 +1,152 @@

+#include "kalmanFilter.h"
+#include <Eigen/Cholesky>
+namespace byte_kalman
+{
+	const double KalmanFilter::chi2inv95[10] = {
+	0,
+	3.8415,
+	5.9915,
+	7.8147,
+	9.4877,
+	11.070,
+	12.592,
+	14.067,
+	15.507,
+	16.919
+	};
+	KalmanFilter::KalmanFilter()
+	{
+		int ndim = 4;
+		double dt = 1.;
+		_motion_mat = Eigen::MatrixXf::Identity(8, 8);
+		for (int i = 0; i < ndim; i++) {
+			_motion_mat(i, ndim + i) = dt;
+		}
+		_update_mat = Eigen::MatrixXf::Identity(4, 8);
+		this->_std_weight_position = 1. / 20;
+		this->_std_weight_velocity = 1. / 160;
+	}
+	KAL_DATA KalmanFilter::initiate(const DETECTBOX &measurement)
+	{
+		DETECTBOX mean_pos = measurement;
+		DETECTBOX mean_vel;
+		for (int i = 0; i < 4; i++) mean_vel(i) = 0;
+		KAL_MEAN mean;
+		for (int i = 0; i < 8; i++) {
+			if (i < 4) mean(i) = mean_pos(i);
+			else mean(i) = mean_vel(i - 4);
+		}
+		KAL_MEAN std;
+		std(0) = 2 * _std_weight_position * measurement[3];
+		std(1) = 2 * _std_weight_position * measurement[3];
+		std(2) = 1e-2;
+		std(3) = 2 * _std_weight_position * measurement[3];
+		std(4) = 10 * _std_weight_velocity * measurement[3];
+		std(5) = 10 * _std_weight_velocity * measurement[3];
+		std(6) = 1e-5;
+		std(7) = 10 * _std_weight_velocity * measurement[3];
+		KAL_MEAN tmp = std.array().square();
+		KAL_COVA var = tmp.asDiagonal();
+		return std::make_pair(mean, var);
+	}
+	void KalmanFilter::predict(KAL_MEAN &mean, KAL_COVA &covariance)
+	{
+		//revise the data;
+		DETECTBOX std_pos;
+		std_pos << _std_weight_position * mean(3),
+			_std_weight_position * mean(3),
+			1e-2,
+			_std_weight_position * mean(3);
+		DETECTBOX std_vel;
+		std_vel << _std_weight_velocity * mean(3),
+			_std_weight_velocity * mean(3),
+			1e-5,
+			_std_weight_velocity * mean(3);
+		KAL_MEAN tmp;
+		tmp.block<1, 4>(0, 0) = std_pos;
+		tmp.block<1, 4>(0, 4) = std_vel;
+		tmp = tmp.array().square();
+		KAL_COVA motion_cov = tmp.asDiagonal();
+		KAL_MEAN mean1 = this->_motion_mat * mean.transpose();
+		KAL_COVA covariance1 = this->_motion_mat * covariance *(_motion_mat.transpose());
+		covariance1 += motion_cov;
+		mean = mean1;
+		covariance = covariance1;
+	}
+	KAL_HDATA KalmanFilter::project(const KAL_MEAN &mean, const KAL_COVA &covariance)
+	{
+		DETECTBOX std;
+		std << _std_weight_position * mean(3), _std_weight_position * mean(3),
+			1e-1, _std_weight_position * mean(3);
+		KAL_HMEAN mean1 = _update_mat * mean.transpose();
+		KAL_HCOVA covariance1 = _update_mat * covariance * (_update_mat.transpose());
+		Eigen::Matrix<float, 4, 4> diag = std.asDiagonal();
+		diag = diag.array().square().matrix();
+		covariance1 += diag;
+		//    covariance1.diagonal() << diag;
+		return std::make_pair(mean1, covariance1);
+	}
+	KAL_DATA
+		KalmanFilter::update(
+			const KAL_MEAN &mean,
+			const KAL_COVA &covariance,
+			const DETECTBOX &measurement)
+	{
+		KAL_HDATA pa = project(mean, covariance);
+		KAL_HMEAN projected_mean = pa.first;
+		KAL_HCOVA projected_cov = pa.second;
+		//chol_factor, lower =
+		//scipy.linalg.cho_factor(projected_cov, lower=True, check_finite=False)
+		//kalmain_gain =
+		//scipy.linalg.cho_solve((cho_factor, lower),
+		//np.dot(covariance, self._upadte_mat.T).T,
+		//check_finite=False).T
+		Eigen::Matrix<float, 4, 8> B = (covariance * (_update_mat.transpose())).transpose();
+		Eigen::Matrix<float, 8, 4> kalman_gain = (projected_cov.llt().solve(B)).transpose(); // eg.8x4
+		Eigen::Matrix<float, 1, 4> innovation = measurement - projected_mean; //eg.1x4
+		auto tmp = innovation * (kalman_gain.transpose());
+		KAL_MEAN new_mean = (mean.array() + tmp.array()).matrix();
+		KAL_COVA new_covariance = covariance - kalman_gain * projected_cov*(kalman_gain.transpose());
+		return std::make_pair(new_mean, new_covariance);
+	}
+	Eigen::Matrix<float, 1, -1>
+		KalmanFilter::gating_distance(
+			const KAL_MEAN &mean,
+			const KAL_COVA &covariance,
+			const std::vector<DETECTBOX> &measurements,
+			bool only_position)
+	{
+		KAL_HDATA pa = this->project(mean, covariance);
+		if (only_position) {
+			printf("not implement!");
+			exit(0);
+		}
+		KAL_HMEAN mean1 = pa.first;
+		KAL_HCOVA covariance1 = pa.second;
+		//    Eigen::Matrix<float, -1, 4, Eigen::RowMajor> d(size, 4);
+		DETECTBOXSS d(measurements.size(), 4);
+		int pos = 0;
+		for (DETECTBOX box : measurements) {
+			d.row(pos++) = box - mean1;
+		}
+		Eigen::Matrix<float, -1, -1, Eigen::RowMajor> factor = covariance1.llt().matrixL();
+		Eigen::Matrix<float, -1, -1> z = factor.triangularView<Eigen::Lower>().solve<Eigen::OnTheRight>(d).transpose();
+		auto zz = ((z.array())*(z.array())).matrix();
+		auto square_maha = zz.colwise().sum();
+		return square_maha;
+	}
+}

deploy/TensorRT/cpp/src/lapjv.cpp ADDED Viewed

	@@ -0,0 +1,343 @@

+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include "lapjv.h"
+/** Column-reduction and reduction transfer for a dense cost matrix.
+ */
+int_t _ccrrt_dense(const uint_t n, cost_t *cost[],
+	int_t *free_rows, int_t *x, int_t *y, cost_t *v)
+{
+	int_t n_free_rows;
+	boolean *unique;
+	for (uint_t i = 0; i < n; i++) {
+		x[i] = -1;
+		v[i] = LARGE;
+		y[i] = 0;
+	}
+	for (uint_t i = 0; i < n; i++) {
+		for (uint_t j = 0; j < n; j++) {
+			const cost_t c = cost[i][j];
+			if (c < v[j]) {
+				v[j] = c;
+				y[j] = i;
+			}
+			PRINTF("i=%d, j=%d, c[i,j]=%f, v[j]=%f y[j]=%d\n", i, j, c, v[j], y[j]);
+		}
+	}
+	PRINT_COST_ARRAY(v, n);
+	PRINT_INDEX_ARRAY(y, n);
+	NEW(unique, boolean, n);
+	memset(unique, TRUE, n);
+	{
+		int_t j = n;
+		do {
+			j--;
+			const int_t i = y[j];
+			if (x[i] < 0) {
+				x[i] = j;
+			}
+			else {
+				unique[i] = FALSE;
+				y[j] = -1;
+			}
+		} while (j > 0);
+	}
+	n_free_rows = 0;
+	for (uint_t i = 0; i < n; i++) {
+		if (x[i] < 0) {
+			free_rows[n_free_rows++] = i;
+		}
+		else if (unique[i]) {
+			const int_t j = x[i];
+			cost_t min = LARGE;
+			for (uint_t j2 = 0; j2 < n; j2++) {
+				if (j2 == (uint_t)j) {
+					continue;
+				}
+				const cost_t c = cost[i][j2] - v[j2];
+				if (c < min) {
+					min = c;
+				}
+			}
+			PRINTF("v[%d] = %f - %f\n", j, v[j], min);
+			v[j] -= min;
+		}
+	}
+	FREE(unique);
+	return n_free_rows;
+}
+/** Augmenting row reduction for a dense cost matrix.
+ */
+int_t _carr_dense(
+	const uint_t n, cost_t *cost[],
+	const uint_t n_free_rows,
+	int_t *free_rows, int_t *x, int_t *y, cost_t *v)
+{
+	uint_t current = 0;
+	int_t new_free_rows = 0;
+	uint_t rr_cnt = 0;
+	PRINT_INDEX_ARRAY(x, n);
+	PRINT_INDEX_ARRAY(y, n);
+	PRINT_COST_ARRAY(v, n);
+	PRINT_INDEX_ARRAY(free_rows, n_free_rows);
+	while (current < n_free_rows) {
+		int_t i0;
+		int_t j1, j2;
+		cost_t v1, v2, v1_new;
+		boolean v1_lowers;
+		rr_cnt++;
+		PRINTF("current = %d rr_cnt = %d\n", current, rr_cnt);
+		const int_t free_i = free_rows[current++];
+		j1 = 0;
+		v1 = cost[free_i][0] - v[0];
+		j2 = -1;
+		v2 = LARGE;
+		for (uint_t j = 1; j < n; j++) {
+			PRINTF("%d = %f %d = %f\n", j1, v1, j2, v2);
+			const cost_t c = cost[free_i][j] - v[j];
+			if (c < v2) {
+				if (c >= v1) {
+					v2 = c;
+					j2 = j;
+				}
+				else {
+					v2 = v1;
+					v1 = c;
+					j2 = j1;
+					j1 = j;
+				}
+			}
+		}
+		i0 = y[j1];
+		v1_new = v[j1] - (v2 - v1);
+		v1_lowers = v1_new < v[j1];
+		PRINTF("%d %d 1=%d,%f 2=%d,%f v1'=%f(%d,%g) \n", free_i, i0, j1, v1, j2, v2, v1_new, v1_lowers, v[j1] - v1_new);
+		if (rr_cnt < current * n) {
+			if (v1_lowers) {
+				v[j1] = v1_new;
+			}
+			else if (i0 >= 0 && j2 >= 0) {
+				j1 = j2;
+				i0 = y[j2];
+			}
+			if (i0 >= 0) {
+				if (v1_lowers) {
+					free_rows[--current] = i0;
+				}
+				else {
+					free_rows[new_free_rows++] = i0;
+				}
+			}
+		}
+		else {
+			PRINTF("rr_cnt=%d >= %d (current=%d * n=%d)\n", rr_cnt, current * n, current, n);
+			if (i0 >= 0) {
+				free_rows[new_free_rows++] = i0;
+			}
+		}
+		x[free_i] = j1;
+		y[j1] = free_i;
+	}
+	return new_free_rows;
+}
+/** Find columns with minimum d[j] and put them on the SCAN list.
+ */
+uint_t _find_dense(const uint_t n, uint_t lo, cost_t *d, int_t *cols, int_t *y)
+{
+	uint_t hi = lo + 1;
+	cost_t mind = d[cols[lo]];
+	for (uint_t k = hi; k < n; k++) {
+		int_t j = cols[k];
+		if (d[j] <= mind) {
+			if (d[j] < mind) {
+				hi = lo;
+				mind = d[j];
+			}
+			cols[k] = cols[hi];
+			cols[hi++] = j;
+		}
+	}
+	return hi;
+}
+// Scan all columns in TODO starting from arbitrary column in SCAN
+// and try to decrease d of the TODO columns using the SCAN column.
+int_t _scan_dense(const uint_t n, cost_t *cost[],
+	uint_t *plo, uint_t*phi,
+	cost_t *d, int_t *cols, int_t *pred,
+	int_t *y, cost_t *v)
+{
+	uint_t lo = *plo;
+	uint_t hi = *phi;
+	cost_t h, cred_ij;
+	while (lo != hi) {
+		int_t j = cols[lo++];
+		const int_t i = y[j];
+		const cost_t mind = d[j];
+		h = cost[i][j] - v[j] - mind;
+		PRINTF("i=%d j=%d h=%f\n", i, j, h);
+		// For all columns in TODO
+		for (uint_t k = hi; k < n; k++) {
+			j = cols[k];
+			cred_ij = cost[i][j] - v[j] - h;
+			if (cred_ij < d[j]) {
+				d[j] = cred_ij;
+				pred[j] = i;
+				if (cred_ij == mind) {
+					if (y[j] < 0) {
+						return j;
+					}
+					cols[k] = cols[hi];
+					cols[hi++] = j;
+				}
+			}
+		}
+	}
+	*plo = lo;
+	*phi = hi;
+	return -1;
+}
+/** Single iteration of modified Dijkstra shortest path algorithm as explained in the JV paper.
+ *
+ * This is a dense matrix version.
+ *
+ * \return The closest free column index.
+ */
+int_t find_path_dense(
+	const uint_t n, cost_t *cost[],
+	const int_t start_i,
+	int_t *y, cost_t *v,
+	int_t *pred)
+{
+	uint_t lo = 0, hi = 0;
+	int_t final_j = -1;
+	uint_t n_ready = 0;
+	int_t *cols;
+	cost_t *d;
+	NEW(cols, int_t, n);
+	NEW(d, cost_t, n);
+	for (uint_t i = 0; i < n; i++) {
+		cols[i] = i;
+		pred[i] = start_i;
+		d[i] = cost[start_i][i] - v[i];
+	}
+	PRINT_COST_ARRAY(d, n);
+	while (final_j == -1) {
+		// No columns left on the SCAN list.
+		if (lo == hi) {
+			PRINTF("%d..%d -> find\n", lo, hi);
+			n_ready = lo;
+			hi = _find_dense(n, lo, d, cols, y);
+			PRINTF("check %d..%d\n", lo, hi);
+			PRINT_INDEX_ARRAY(cols, n);
+			for (uint_t k = lo; k < hi; k++) {
+				const int_t j = cols[k];
+				if (y[j] < 0) {
+					final_j = j;
+				}
+			}
+		}
+		if (final_j == -1) {
+			PRINTF("%d..%d -> scan\n", lo, hi);
+			final_j = _scan_dense(
+				n, cost, &lo, &hi, d, cols, pred, y, v);
+			PRINT_COST_ARRAY(d, n);
+			PRINT_INDEX_ARRAY(cols, n);
+			PRINT_INDEX_ARRAY(pred, n);
+		}
+	}
+	PRINTF("found final_j=%d\n", final_j);
+	PRINT_INDEX_ARRAY(cols, n);
+	{
+		const cost_t mind = d[cols[lo]];
+		for (uint_t k = 0; k < n_ready; k++) {
+			const int_t j = cols[k];
+			v[j] += d[j] - mind;
+		}
+	}
+	FREE(cols);
+	FREE(d);
+	return final_j;
+}
+/** Augment for a dense cost matrix.
+ */
+int_t _ca_dense(
+	const uint_t n, cost_t *cost[],
+	const uint_t n_free_rows,
+	int_t *free_rows, int_t *x, int_t *y, cost_t *v)
+{
+	int_t *pred;
+	NEW(pred, int_t, n);
+	for (int_t *pfree_i = free_rows; pfree_i < free_rows + n_free_rows; pfree_i++) {
+		int_t i = -1, j;
+		uint_t k = 0;
+		PRINTF("looking at free_i=%d\n", *pfree_i);
+		j = find_path_dense(n, cost, *pfree_i, y, v, pred);
+		ASSERT(j >= 0);
+		ASSERT(j < n);
+		while (i != *pfree_i) {
+			PRINTF("augment %d\n", j);
+			PRINT_INDEX_ARRAY(pred, n);
+			i = pred[j];
+			PRINTF("y[%d]=%d -> %d\n", j, y[j], i);
+			y[j] = i;
+			PRINT_INDEX_ARRAY(x, n);
+			SWAP_INDICES(j, x[i]);
+			k++;
+			if (k >= n) {
+				ASSERT(FALSE);
+			}
+		}
+	}
+	FREE(pred);
+	return 0;
+}
+/** Solve dense sparse LAP.
+ */
+int lapjv_internal(
+	const uint_t n, cost_t *cost[],
+	int_t *x, int_t *y)
+{
+	int ret;
+	int_t *free_rows;
+	cost_t *v;
+	NEW(free_rows, int_t, n);
+	NEW(v, cost_t, n);
+	ret = _ccrrt_dense(n, cost, free_rows, x, y, v);
+	int i = 0;
+	while (ret > 0 && i < 2) {
+		ret = _carr_dense(n, cost, ret, free_rows, x, y, v);
+		i++;
+	}
+	if (ret > 0) {
+		ret = _ca_dense(n, cost, ret, free_rows, x, y, v);
+	}
+	FREE(v);
+	FREE(free_rows);
+	return ret;
+}

deploy/TensorRT/cpp/src/utils.cpp ADDED Viewed

	@@ -0,0 +1,429 @@

+#include "BYTETracker.h"
+#include "lapjv.h"
+vector<STrack*> BYTETracker::joint_stracks(vector<STrack*> &tlista, vector<STrack> &tlistb)
+{
+	map<int, int> exists;
+	vector<STrack*> res;
+	for (int i = 0; i < tlista.size(); i++)
+	{
+		exists.insert(pair<int, int>(tlista[i]->track_id, 1));
+		res.push_back(tlista[i]);
+	}
+	for (int i = 0; i < tlistb.size(); i++)
+	{
+		int tid = tlistb[i].track_id;
+		if (!exists[tid] || exists.count(tid) == 0)
+		{
+			exists[tid] = 1;
+			res.push_back(&tlistb[i]);
+		}
+	}
+	return res;
+}
+vector<STrack> BYTETracker::joint_stracks(vector<STrack> &tlista, vector<STrack> &tlistb)
+{
+	map<int, int> exists;
+	vector<STrack> res;
+	for (int i = 0; i < tlista.size(); i++)
+	{
+		exists.insert(pair<int, int>(tlista[i].track_id, 1));
+		res.push_back(tlista[i]);
+	}
+	for (int i = 0; i < tlistb.size(); i++)
+	{
+		int tid = tlistb[i].track_id;
+		if (!exists[tid] || exists.count(tid) == 0)
+		{
+			exists[tid] = 1;
+			res.push_back(tlistb[i]);
+		}
+	}
+	return res;
+}
+vector<STrack> BYTETracker::sub_stracks(vector<STrack> &tlista, vector<STrack> &tlistb)
+{
+	map<int, STrack> stracks;
+	for (int i = 0; i < tlista.size(); i++)
+	{
+		stracks.insert(pair<int, STrack>(tlista[i].track_id, tlista[i]));
+	}
+	for (int i = 0; i < tlistb.size(); i++)
+	{
+		int tid = tlistb[i].track_id;
+		if (stracks.count(tid) != 0)
+		{
+			stracks.erase(tid);
+		}
+	}
+	vector<STrack> res;
+	std::map<int, STrack>::iterator  it;
+	for (it = stracks.begin(); it != stracks.end(); ++it)
+	{
+		res.push_back(it->second);
+	}
+	return res;
+}
+void BYTETracker::remove_duplicate_stracks(vector<STrack> &resa, vector<STrack> &resb, vector<STrack> &stracksa, vector<STrack> &stracksb)
+{
+	vector<vector<float> > pdist = iou_distance(stracksa, stracksb);
+	vector<pair<int, int> > pairs;
+	for (int i = 0; i < pdist.size(); i++)
+	{
+		for (int j = 0; j < pdist[i].size(); j++)
+		{
+			if (pdist[i][j] < 0.15)
+			{
+				pairs.push_back(pair<int, int>(i, j));
+			}
+		}
+	}
+	vector<int> dupa, dupb;
+	for (int i = 0; i < pairs.size(); i++)
+	{
+		int timep = stracksa[pairs[i].first].frame_id - stracksa[pairs[i].first].start_frame;
+		int timeq = stracksb[pairs[i].second].frame_id - stracksb[pairs[i].second].start_frame;
+		if (timep > timeq)
+			dupb.push_back(pairs[i].second);
+		else
+			dupa.push_back(pairs[i].first);
+	}
+	for (int i = 0; i < stracksa.size(); i++)
+	{
+		vector<int>::iterator iter = find(dupa.begin(), dupa.end(), i);
+		if (iter == dupa.end())
+		{
+			resa.push_back(stracksa[i]);
+		}
+	}
+	for (int i = 0; i < stracksb.size(); i++)
+	{
+		vector<int>::iterator iter = find(dupb.begin(), dupb.end(), i);
+		if (iter == dupb.end())
+		{
+			resb.push_back(stracksb[i]);
+		}
+	}
+}
+void BYTETracker::linear_assignment(vector<vector<float> > &cost_matrix, int cost_matrix_size, int cost_matrix_size_size, float thresh,
+	vector<vector<int> > &matches, vector<int> &unmatched_a, vector<int> &unmatched_b)
+{
+	if (cost_matrix.size() == 0)
+	{
+		for (int i = 0; i < cost_matrix_size; i++)
+		{
+			unmatched_a.push_back(i);
+		}
+		for (int i = 0; i < cost_matrix_size_size; i++)
+		{
+			unmatched_b.push_back(i);
+		}
+		return;
+	}
+	vector<int> rowsol; vector<int> colsol;
+	float c = lapjv(cost_matrix, rowsol, colsol, true, thresh);
+	for (int i = 0; i < rowsol.size(); i++)
+	{
+		if (rowsol[i] >= 0)
+		{
+			vector<int> match;
+			match.push_back(i);
+			match.push_back(rowsol[i]);
+			matches.push_back(match);
+		}
+		else
+		{
+			unmatched_a.push_back(i);
+		}
+	}
+	for (int i = 0; i < colsol.size(); i++)
+	{
+		if (colsol[i] < 0)
+		{
+			unmatched_b.push_back(i);
+		}
+	}
+}
+vector<vector<float> > BYTETracker::ious(vector<vector<float> > &atlbrs, vector<vector<float> > &btlbrs)
+{
+	vector<vector<float> > ious;
+	if (atlbrs.size()*btlbrs.size() == 0)
+		return ious;
+	ious.resize(atlbrs.size());
+	for (int i = 0; i < ious.size(); i++)
+	{
+		ious[i].resize(btlbrs.size());
+	}
+	//bbox_ious
+	for (int k = 0; k < btlbrs.size(); k++)
+	{
+		vector<float> ious_tmp;
+		float box_area = (btlbrs[k][2] - btlbrs[k][0] + 1)*(btlbrs[k][3] - btlbrs[k][1] + 1);
+		for (int n = 0; n < atlbrs.size(); n++)
+		{
+			float iw = min(atlbrs[n][2], btlbrs[k][2]) - max(atlbrs[n][0], btlbrs[k][0]) + 1;
+			if (iw > 0)
+			{
+				float ih = min(atlbrs[n][3], btlbrs[k][3]) - max(atlbrs[n][1], btlbrs[k][1]) + 1;
+				if(ih > 0)
+				{
+					float ua = (atlbrs[n][2] - atlbrs[n][0] + 1)*(atlbrs[n][3] - atlbrs[n][1] + 1) + box_area - iw * ih;
+					ious[n][k] = iw * ih / ua;
+				}
+				else
+				{
+					ious[n][k] = 0.0;
+				}
+			}
+			else
+			{
+				ious[n][k] = 0.0;
+			}
+		}
+	}
+	return ious;
+}
+vector<vector<float> > BYTETracker::iou_distance(vector<STrack*> &atracks, vector<STrack> &btracks, int &dist_size, int &dist_size_size)
+{
+	vector<vector<float> > cost_matrix;
+	if (atracks.size() * btracks.size() == 0)
+	{
+		dist_size = atracks.size();
+		dist_size_size = btracks.size();
+		return cost_matrix;
+	}
+	vector<vector<float> > atlbrs, btlbrs;
+	for (int i = 0; i < atracks.size(); i++)
+	{
+		atlbrs.push_back(atracks[i]->tlbr);
+	}
+	for (int i = 0; i < btracks.size(); i++)
+	{
+		btlbrs.push_back(btracks[i].tlbr);
+	}
+	dist_size = atracks.size();
+	dist_size_size = btracks.size();
+	vector<vector<float> > _ious = ious(atlbrs, btlbrs);
+	for (int i = 0; i < _ious.size();i++)
+	{
+		vector<float> _iou;
+		for (int j = 0; j < _ious[i].size(); j++)
+		{
+			_iou.push_back(1 - _ious[i][j]);
+		}
+		cost_matrix.push_back(_iou);
+	}
+	return cost_matrix;
+}
+vector<vector<float> > BYTETracker::iou_distance(vector<STrack> &atracks, vector<STrack> &btracks)
+{
+	vector<vector<float> > atlbrs, btlbrs;
+	for (int i = 0; i < atracks.size(); i++)
+	{
+		atlbrs.push_back(atracks[i].tlbr);
+	}
+	for (int i = 0; i < btracks.size(); i++)
+	{
+		btlbrs.push_back(btracks[i].tlbr);
+	}
+	vector<vector<float> > _ious = ious(atlbrs, btlbrs);
+	vector<vector<float> > cost_matrix;
+	for (int i = 0; i < _ious.size(); i++)
+	{
+		vector<float> _iou;
+		for (int j = 0; j < _ious[i].size(); j++)
+		{
+			_iou.push_back(1 - _ious[i][j]);
+		}
+		cost_matrix.push_back(_iou);
+	}
+	return cost_matrix;
+}
+double BYTETracker::lapjv(const vector<vector<float> > &cost, vector<int> &rowsol, vector<int> &colsol,
+	bool extend_cost, float cost_limit, bool return_cost)
+{
+	vector<vector<float> > cost_c;
+	cost_c.assign(cost.begin(), cost.end());
+	vector<vector<float> > cost_c_extended;
+	int n_rows = cost.size();
+	int n_cols = cost[0].size();
+	rowsol.resize(n_rows);
+	colsol.resize(n_cols);
+	int n = 0;
+	if (n_rows == n_cols)
+	{
+		n = n_rows;
+	}
+	else
+	{
+		if (!extend_cost)
+		{
+			cout << "set extend_cost=True" << endl;
+			system("pause");
+			exit(0);
+		}
+	}
+	if (extend_cost || cost_limit < LONG_MAX)
+	{
+		n = n_rows + n_cols;
+		cost_c_extended.resize(n);
+		for (int i = 0; i < cost_c_extended.size(); i++)
+			cost_c_extended[i].resize(n);
+		if (cost_limit < LONG_MAX)
+		{
+			for (int i = 0; i < cost_c_extended.size(); i++)
+			{
+				for (int j = 0; j < cost_c_extended[i].size(); j++)
+				{
+					cost_c_extended[i][j] = cost_limit / 2.0;
+				}
+			}
+		}
+		else
+		{
+			float cost_max = -1;
+			for (int i = 0; i < cost_c.size(); i++)
+			{
+				for (int j = 0; j < cost_c[i].size(); j++)
+				{
+					if (cost_c[i][j] > cost_max)
+						cost_max = cost_c[i][j];
+				}
+			}
+			for (int i = 0; i < cost_c_extended.size(); i++)
+			{
+				for (int j = 0; j < cost_c_extended[i].size(); j++)
+				{
+					cost_c_extended[i][j] = cost_max + 1;
+				}
+			}
+		}
+		for (int i = n_rows; i < cost_c_extended.size(); i++)
+		{
+			for (int j = n_cols; j < cost_c_extended[i].size(); j++)
+			{
+				cost_c_extended[i][j] = 0;
+			}
+		}
+		for (int i = 0; i < n_rows; i++)
+		{
+			for (int j = 0; j < n_cols; j++)
+			{
+				cost_c_extended[i][j] = cost_c[i][j];
+			}
+		}
+		cost_c.clear();
+		cost_c.assign(cost_c_extended.begin(), cost_c_extended.end());
+	}
+	double **cost_ptr;
+	cost_ptr = new double *[sizeof(double *) * n];
+	for (int i = 0; i < n; i++)
+		cost_ptr[i] = new double[sizeof(double) * n];
+	for (int i = 0; i < n; i++)
+	{
+		for (int j = 0; j < n; j++)
+		{
+			cost_ptr[i][j] = cost_c[i][j];
+		}
+	}
+	int* x_c = new int[sizeof(int) * n];
+	int *y_c = new int[sizeof(int) * n];
+	int ret = lapjv_internal(n, cost_ptr, x_c, y_c);
+	if (ret != 0)
+	{
+		cout << "Calculate Wrong!" << endl;
+		system("pause");
+		exit(0);
+	}
+	double opt = 0.0;
+	if (n != n_rows)
+	{
+		for (int i = 0; i < n; i++)
+		{
+			if (x_c[i] >= n_cols)
+				x_c[i] = -1;
+			if (y_c[i] >= n_rows)
+				y_c[i] = -1;
+		}
+		for (int i = 0; i < n_rows; i++)
+		{
+			rowsol[i] = x_c[i];
+		}
+		for (int i = 0; i < n_cols; i++)
+		{
+			colsol[i] = y_c[i];
+		}
+		if (return_cost)
+		{
+			for (int i = 0; i < rowsol.size(); i++)
+			{
+				if (rowsol[i] != -1)
+				{
+					//cout << i << "\t" << rowsol[i] << "\t" << cost_ptr[i][rowsol[i]] << endl;
+					opt += cost_ptr[i][rowsol[i]];
+				}
+			}
+		}
+	}
+	else if (return_cost)
+	{
+		for (int i = 0; i < rowsol.size(); i++)
+		{
+			opt += cost_ptr[i][rowsol[i]];
+		}
+	}
+	for (int i = 0; i < n; i++)
+	{
+		delete[]cost_ptr[i];
+	}
+	delete[]cost_ptr;
+	delete[]x_c;
+	delete[]y_c;
+	return opt;
+}
+Scalar BYTETracker::get_color(int idx)
+{
+	idx += 3;
+	return Scalar(37 * idx % 255, 17 * idx % 255, 29 * idx % 255);
+}

deploy/TensorRT/python/README.md ADDED Viewed

	@@ -0,0 +1,22 @@

+# ByteTrack-TensorRT in Python
+## Install TensorRT Toolkit
+Please follow the [TensorRT Installation Guide](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html) and [torch2trt gitrepo](https://github.com/NVIDIA-AI-IOT/torch2trt) to install TensorRT (Version 7 recommended) and torch2trt.
+## Convert model
+You can convert the Pytorch model “bytetrack_s_mot17” to TensorRT model by running:
+```shell
+cd <ByteTrack_HOME>
+python3 tools/trt.py -f exps/example/mot/yolox_s_mix_det.py -c pretrained/bytetrack_s_mot17.pth.tar
+```
+## Run TensorRT demo
+You can use the converted model_trt.pth to run TensorRT demo with **130 FPS**:
+```shell
+cd <ByteTrack_HOME>
+python3 tools/demo_track.py video -f exps/example/mot/yolox_s_mix_det.py --trt --save_result
+```

deploy/ncnn/cpp/CMakeLists.txt ADDED Viewed

	@@ -0,0 +1,84 @@

+macro(ncnn_add_example name)
+    add_executable(${name} ${name}.cpp)
+    if(OpenCV_FOUND)
+        target_include_directories(${name} PRIVATE ${OpenCV_INCLUDE_DIRS})
+        target_link_libraries(${name} PRIVATE ncnn ${OpenCV_LIBS})
+    elseif(NCNN_SIMPLEOCV)
+        target_compile_definitions(${name} PUBLIC USE_NCNN_SIMPLEOCV)
+        target_link_libraries(${name} PRIVATE ncnn)
+    endif()
+    # add test to a virtual project group
+    set_property(TARGET ${name} PROPERTY FOLDER "examples")
+endmacro()
+if(NCNN_PIXEL)
+    find_package(OpenCV QUIET COMPONENTS opencv_world)
+    # for opencv 2.4 on ubuntu 16.04, there is no opencv_world but OpenCV_FOUND will be TRUE
+    if("${OpenCV_LIBS}" STREQUAL "")
+        set(OpenCV_FOUND FALSE)
+    endif()
+    if(NOT OpenCV_FOUND)
+        find_package(OpenCV QUIET COMPONENTS core highgui imgproc imgcodecs videoio)
+    endif()
+    if(NOT OpenCV_FOUND)
+        find_package(OpenCV QUIET COMPONENTS core highgui imgproc)
+    endif()
+    if(OpenCV_FOUND OR NCNN_SIMPLEOCV)
+        if(OpenCV_FOUND)
+            message(STATUS "OpenCV library: ${OpenCV_INSTALL_PATH}")
+            message(STATUS "    version: ${OpenCV_VERSION}")
+            message(STATUS "    libraries: ${OpenCV_LIBS}")
+            message(STATUS "    include path: ${OpenCV_INCLUDE_DIRS}")
+            if(${OpenCV_VERSION_MAJOR} GREATER 3)
+                set(CMAKE_CXX_STANDARD 11)
+            endif()
+        endif()
+        include_directories(${CMAKE_CURRENT_SOURCE_DIR}/../src)
+        include_directories(${CMAKE_CURRENT_BINARY_DIR}/../src)
+        include_directories(include)
+        include_directories(/usr/local/include/eigen3)
+        ncnn_add_example(squeezenet)
+        ncnn_add_example(squeezenet_c_api)
+        ncnn_add_example(fasterrcnn)
+        ncnn_add_example(rfcn)
+        ncnn_add_example(yolov2)
+        ncnn_add_example(yolov3)
+        if(OpenCV_FOUND)
+            ncnn_add_example(yolov4)
+        endif()
+        ncnn_add_example(yolov5)
+        ncnn_add_example(yolox)
+        ncnn_add_example(mobilenetv2ssdlite)
+        ncnn_add_example(mobilenetssd)
+        ncnn_add_example(squeezenetssd)
+        ncnn_add_example(shufflenetv2)
+        ncnn_add_example(peleenetssd_seg)
+        ncnn_add_example(simplepose)
+        ncnn_add_example(retinaface)
+        ncnn_add_example(yolact)
+        ncnn_add_example(nanodet)
+        ncnn_add_example(scrfd)
+        ncnn_add_example(scrfd_crowdhuman)
+        ncnn_add_example(rvm)
+        file(GLOB My_Source_Files src/*.cpp)
+        add_executable(bytetrack ${My_Source_Files})
+        if(OpenCV_FOUND)
+            target_include_directories(bytetrack PRIVATE ${OpenCV_INCLUDE_DIRS})
+            target_link_libraries(bytetrack PRIVATE ncnn ${OpenCV_LIBS})
+        elseif(NCNN_SIMPLEOCV)
+            target_compile_definitions(bytetrack PUBLIC USE_NCNN_SIMPLEOCV)
+            target_link_libraries(bytetrack PRIVATE ncnn)
+        endif()
+        # add test to a virtual project group
+        set_property(TARGET bytetrack PROPERTY FOLDER "examples")
+    else()
+        message(WARNING "OpenCV not found and NCNN_SIMPLEOCV disabled, examples won't be built")
+    endif()
+else()
+    message(WARNING "NCNN_PIXEL not enabled, examples won't be built")
+endif()

deploy/ncnn/cpp/README.md ADDED Viewed

	@@ -0,0 +1,103 @@

+# ByteTrack-CPP-ncnn
+## Installation
+Clone [ncnn](https://github.com/Tencent/ncnn) first, then please following [build tutorial of ncnn](https://github.com/Tencent/ncnn/wiki/how-to-build) to build on your own device.
+Install eigen-3.3.9 [[google]](https://drive.google.com/file/d/1rqO74CYCNrmRAg8Rra0JP3yZtJ-rfket/view?usp=sharing), [[baidu(code:ueq4)]](https://pan.baidu.com/s/15kEfCxpy-T7tz60msxxExg).
+```shell
+unzip eigen-3.3.9.zip
+cd eigen-3.3.9
+mkdir build
+cd build
+cmake ..
+sudo make install
+```
+## Generate onnx file
+Use provided tools to generate onnx file.
+For example, if you want to generate onnx file of bytetrack_s_mot17.pth, please run the following command:
+```shell
+cd <ByteTrack_HOME>
+python3 tools/export_onnx.py -f exps/example/mot/yolox_s_mix_det.py -c pretrained/bytetrack_s_mot17.pth.tar
+```
+Then, a bytetrack_s.onnx file is generated under <ByteTrack_HOME>.
+## Generate ncnn param and bin file
+Put bytetrack_s.onnx under ncnn/build/tools/onnx and then run:
+```shell
+cd ncnn/build/tools/onnx
+./onnx2ncnn bytetrack_s.onnx bytetrack_s.param bytetrack_s.bin
+```
+Since Focus module is not supported in ncnn. Warnings like:
+```shell
+Unsupported slice step !
+```
+will be printed. However, don't  worry!  C++ version of Focus layer is already implemented in src/bytetrack.cpp.
+## Modify param file
+Open **bytetrack_s.param**, and modify it.
+Before (just an example):
+```
+235 268
+Input            images                   0 1 images
+Split            splitncnn_input0         1 4 images images_splitncnn_0 images_splitncnn_1 images_splitncnn_2 images_splitncnn_3
+Crop             Slice_4                  1 1 images_splitncnn_3 467 -23309=1,0 -23310=1,2147483647 -23311=1,1
+Crop             Slice_9                  1 1 467 472 -23309=1,0 -23310=1,2147483647 -23311=1,2
+Crop             Slice_14                 1 1 images_splitncnn_2 477 -23309=1,0 -23310=1,2147483647 -23311=1,1
+Crop             Slice_19                 1 1 477 482 -23309=1,1 -23310=1,2147483647 -23311=1,2
+Crop             Slice_24                 1 1 images_splitncnn_1 487 -23309=1,1 -23310=1,2147483647 -23311=1,1
+Crop             Slice_29                 1 1 487 492 -23309=1,0 -23310=1,2147483647 -23311=1,2
+Crop             Slice_34                 1 1 images_splitncnn_0 497 -23309=1,1 -23310=1,2147483647 -23311=1,1
+Crop             Slice_39                 1 1 497 502 -23309=1,1 -23310=1,2147483647 -23311=1,2
+Concat           Concat_40                4 1 472 492 482 502 503 0=0
+...
+```
+* Change first number for 235 to 235 - 9 = 226(since we will remove 10 layers and add 1 layers, total layers number should minus 9).
+* Then remove 10 lines of code from Split to Concat, but remember the last but 2nd number: 503.
+* Add YoloV5Focus layer After Input (using previous number 503):
+```
+YoloV5Focus      focus                    1 1 images 503
+```
+After(just an exmaple):
+```
+226 328
+Input            images                   0 1 images
+YoloV5Focus      focus                    1 1 images 503
+...
+```
+## Use ncnn_optimize to generate new param and bin
+```shell
+# suppose you are still under ncnn/build/tools/onnx dir.
+../ncnnoptimize bytetrack_s.param bytetrack_s.bin bytetrack_s_op.param bytetrack_s_op.bin 65536
+```
+## Copy files and build ByteTrack
+Copy or move 'src', 'include' folders and 'CMakeLists.txt' file into ncnn/examples. Copy bytetrack_s_op.param, bytetrack_s_op.bin and <ByteTrack_HOME>/videos/palace.mp4 into ncnn/build/examples. Then, build ByteTrack:
+```shell
+cd ncnn/build/examples
+cmake ..
+make
+```
+## Run the demo
+You can run the ncnn demo with **5 FPS** (96-core Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz):
+```shell
+./bytetrack palace.mp4
+```
+You can modify 'num_threads' to optimize the running speed in [bytetrack.cpp](https://github.com/ifzhang/ByteTrack/blob/2e9a67895da6b47b948015f6861bba0bacd4e72f/deploy/ncnn/cpp/src/bytetrack.cpp#L309) according to the number of your CPU cores:
+```
+yolox.opt.num_threads = 20;
+```
+## Acknowledgement
+* [ncnn](https://github.com/Tencent/ncnn)

deploy/ncnn/cpp/include/BYTETracker.h ADDED Viewed

	@@ -0,0 +1,49 @@

+#pragma once
+#include "STrack.h"
+struct Object
+{
+    cv::Rect_<float> rect;
+    int label;
+    float prob;
+};
+class BYTETracker
+{
+public:
+	BYTETracker(int frame_rate = 30, int track_buffer = 30);
+	~BYTETracker();
+	vector<STrack> update(const vector<Object>& objects);
+	Scalar get_color(int idx);
+private:
+	vector<STrack*> joint_stracks(vector<STrack*> &tlista, vector<STrack> &tlistb);
+	vector<STrack> joint_stracks(vector<STrack> &tlista, vector<STrack> &tlistb);
+	vector<STrack> sub_stracks(vector<STrack> &tlista, vector<STrack> &tlistb);
+	void remove_duplicate_stracks(vector<STrack> &resa, vector<STrack> &resb, vector<STrack> &stracksa, vector<STrack> &stracksb);
+	void linear_assignment(vector<vector<float> > &cost_matrix, int cost_matrix_size, int cost_matrix_size_size, float thresh,
+		vector<vector<int> > &matches, vector<int> &unmatched_a, vector<int> &unmatched_b);
+	vector<vector<float> > iou_distance(vector<STrack*> &atracks, vector<STrack> &btracks, int &dist_size, int &dist_size_size);
+	vector<vector<float> > iou_distance(vector<STrack> &atracks, vector<STrack> &btracks);
+	vector<vector<float> > ious(vector<vector<float> > &atlbrs, vector<vector<float> > &btlbrs);
+	double lapjv(const vector<vector<float> > &cost, vector<int> &rowsol, vector<int> &colsol,
+		bool extend_cost = false, float cost_limit = LONG_MAX, bool return_cost = true);
+private:
+	float track_thresh;
+	float high_thresh;
+	float match_thresh;
+	int frame_id;
+	int max_time_lost;
+	vector<STrack> tracked_stracks;
+	vector<STrack> lost_stracks;
+	vector<STrack> removed_stracks;
+	byte_kalman::KalmanFilter kalman_filter;
+};

deploy/ncnn/cpp/include/STrack.h ADDED Viewed

	@@ -0,0 +1,50 @@

+#pragma once
+#include <opencv2/opencv.hpp>
+#include "kalmanFilter.h"
+using namespace cv;
+using namespace std;
+enum TrackState { New = 0, Tracked, Lost, Removed };
+class STrack
+{
+public:
+	STrack(vector<float> tlwh_, float score);
+	~STrack();
+	vector<float> static tlbr_to_tlwh(vector<float> &tlbr);
+	void static multi_predict(vector<STrack*> &stracks, byte_kalman::KalmanFilter &kalman_filter);
+	void static_tlwh();
+	void static_tlbr();
+	vector<float> tlwh_to_xyah(vector<float> tlwh_tmp);
+	vector<float> to_xyah();
+	void mark_lost();
+	void mark_removed();
+	int next_id();
+	int end_frame();
+	void activate(byte_kalman::KalmanFilter &kalman_filter, int frame_id);
+	void re_activate(STrack &new_track, int frame_id, bool new_id = false);
+	void update(STrack &new_track, int frame_id);
+public:
+	bool is_activated;
+	int track_id;
+	int state;
+	vector<float> _tlwh;
+	vector<float> tlwh;
+	vector<float> tlbr;
+	int frame_id;
+	int tracklet_len;
+	int start_frame;
+	KAL_MEAN mean;
+	KAL_COVA covariance;
+	float score;
+private:
+	byte_kalman::KalmanFilter kalman_filter;
+};

deploy/ncnn/cpp/include/dataType.h ADDED Viewed

	@@ -0,0 +1,36 @@

+#pragma once
+#include <cstddef>
+#include <vector>
+#include <Eigen/Core>
+#include <Eigen/Dense>
+typedef Eigen::Matrix<float, 1, 4, Eigen::RowMajor> DETECTBOX;
+typedef Eigen::Matrix<float, -1, 4, Eigen::RowMajor> DETECTBOXSS;
+typedef Eigen::Matrix<float, 1, 128, Eigen::RowMajor> FEATURE;
+typedef Eigen::Matrix<float, Eigen::Dynamic, 128, Eigen::RowMajor> FEATURESS;
+//typedef std::vector<FEATURE> FEATURESS;
+//Kalmanfilter
+//typedef Eigen::Matrix<float, 8, 8, Eigen::RowMajor> KAL_FILTER;
+typedef Eigen::Matrix<float, 1, 8, Eigen::RowMajor> KAL_MEAN;
+typedef Eigen::Matrix<float, 8, 8, Eigen::RowMajor> KAL_COVA;
+typedef Eigen::Matrix<float, 1, 4, Eigen::RowMajor> KAL_HMEAN;
+typedef Eigen::Matrix<float, 4, 4, Eigen::RowMajor> KAL_HCOVA;
+using KAL_DATA = std::pair<KAL_MEAN, KAL_COVA>;
+using KAL_HDATA = std::pair<KAL_HMEAN, KAL_HCOVA>;
+//main
+using RESULT_DATA = std::pair<int, DETECTBOX>;
+//tracker:
+using TRACKER_DATA = std::pair<int, FEATURESS>;
+using MATCH_DATA = std::pair<int, int>;
+typedef struct t {
+	std::vector<MATCH_DATA> matches;
+	std::vector<int> unmatched_tracks;
+	std::vector<int> unmatched_detections;
+}TRACHER_MATCHD;
+//linear_assignment:
+typedef Eigen::Matrix<float, -1, -1, Eigen::RowMajor> DYNAMICM;

deploy/ncnn/cpp/include/kalmanFilter.h ADDED Viewed

	@@ -0,0 +1,31 @@

+#pragma once
+#include "dataType.h"
+namespace byte_kalman
+{
+	class KalmanFilter
+	{
+	public:
+		static const double chi2inv95[10];
+		KalmanFilter();
+		KAL_DATA initiate(const DETECTBOX& measurement);
+		void predict(KAL_MEAN& mean, KAL_COVA& covariance);
+		KAL_HDATA project(const KAL_MEAN& mean, const KAL_COVA& covariance);
+		KAL_DATA update(const KAL_MEAN& mean,
+			const KAL_COVA& covariance,
+			const DETECTBOX& measurement);
+		Eigen::Matrix<float, 1, -1> gating_distance(
+			const KAL_MEAN& mean,
+			const KAL_COVA& covariance,
+			const std::vector<DETECTBOX>& measurements,
+			bool only_position = false);
+	private:
+		Eigen::Matrix<float, 8, 8, Eigen::RowMajor> _motion_mat;
+		Eigen::Matrix<float, 4, 8, Eigen::RowMajor> _update_mat;
+		float _std_weight_position;
+		float _std_weight_velocity;
+	};
+}

deploy/ncnn/cpp/include/lapjv.h ADDED Viewed

	@@ -0,0 +1,63 @@

+#ifndef LAPJV_H
+#define LAPJV_H
+#define LARGE 1000000
+#if !defined TRUE
+#define TRUE 1
+#endif
+#if !defined FALSE
+#define FALSE 0
+#endif
+#define NEW(x, t, n) if ((x = (t *)malloc(sizeof(t) * (n))) == 0) { return -1; }
+#define FREE(x) if (x != 0) { free(x); x = 0; }
+#define SWAP_INDICES(a, b) { int_t _temp_index = a; a = b; b = _temp_index; }
+#if 0
+#include <assert.h>
+#define ASSERT(cond) assert(cond)
+#define PRINTF(fmt, ...) printf(fmt, ##__VA_ARGS__)
+#define PRINT_COST_ARRAY(a, n) \
+    while (1) { \
+        printf(#a" = ["); \
+        if ((n) > 0) { \
+            printf("%f", (a)[0]); \
+            for (uint_t j = 1; j < n; j++) { \
+                printf(", %f", (a)[j]); \
+            } \
+        } \
+        printf("]\n"); \
+        break; \
+    }
+#define PRINT_INDEX_ARRAY(a, n) \
+    while (1) { \
+        printf(#a" = ["); \
+        if ((n) > 0) { \
+            printf("%d", (a)[0]); \
+            for (uint_t j = 1; j < n; j++) { \
+                printf(", %d", (a)[j]); \
+            } \
+        } \
+        printf("]\n"); \
+        break; \
+    }
+#else
+#define ASSERT(cond)
+#define PRINTF(fmt, ...)
+#define PRINT_COST_ARRAY(a, n)
+#define PRINT_INDEX_ARRAY(a, n)
+#endif
+typedef signed int int_t;
+typedef unsigned int uint_t;
+typedef double cost_t;
+typedef char boolean;
+typedef enum fp_t { FP_1 = 1, FP_2 = 2, FP_DYNAMIC = 3 } fp_t;
+extern int_t lapjv_internal(
+	const uint_t n, cost_t *cost[],
+	int_t *x, int_t *y);
+#endif // LAPJV_H

deploy/ncnn/cpp/src/BYTETracker.cpp ADDED Viewed

	@@ -0,0 +1,241 @@

+#include "BYTETracker.h"
+#include <fstream>
+BYTETracker::BYTETracker(int frame_rate, int track_buffer)
+{
+	track_thresh = 0.5;
+	high_thresh = 0.6;
+	match_thresh = 0.8;
+	frame_id = 0;
+	max_time_lost = int(frame_rate / 30.0 * track_buffer);
+	cout << "Init ByteTrack!" << endl;
+}
+BYTETracker::~BYTETracker()
+{
+}
+vector<STrack> BYTETracker::update(const vector<Object>& objects)
+{
+	////////////////// Step 1: Get detections //////////////////
+	this->frame_id++;
+	vector<STrack> activated_stracks;
+	vector<STrack> refind_stracks;
+	vector<STrack> removed_stracks;
+	vector<STrack> lost_stracks;
+	vector<STrack> detections;
+	vector<STrack> detections_low;
+	vector<STrack> detections_cp;
+	vector<STrack> tracked_stracks_swap;
+	vector<STrack> resa, resb;
+	vector<STrack> output_stracks;
+	vector<STrack*> unconfirmed;
+	vector<STrack*> tracked_stracks;
+	vector<STrack*> strack_pool;
+	vector<STrack*> r_tracked_stracks;
+	if (objects.size() > 0)
+	{
+		for (int i = 0; i < objects.size(); i++)
+		{
+			vector<float> tlbr_;
+			tlbr_.resize(4);
+			tlbr_[0] = objects[i].rect.x;
+			tlbr_[1] = objects[i].rect.y;
+			tlbr_[2] = objects[i].rect.x + objects[i].rect.width;
+			tlbr_[3] = objects[i].rect.y + objects[i].rect.height;
+			float score = objects[i].prob;
+			STrack strack(STrack::tlbr_to_tlwh(tlbr_), score);
+			if (score >= track_thresh)
+			{
+				detections.push_back(strack);
+			}
+			else
+			{
+				detections_low.push_back(strack);
+			}
+		}
+	}
+	// Add newly detected tracklets to tracked_stracks
+	for (int i = 0; i < this->tracked_stracks.size(); i++)
+	{
+		if (!this->tracked_stracks[i].is_activated)
+			unconfirmed.push_back(&this->tracked_stracks[i]);
+		else
+			tracked_stracks.push_back(&this->tracked_stracks[i]);
+	}
+	////////////////// Step 2: First association, with IoU //////////////////
+	strack_pool = joint_stracks(tracked_stracks, this->lost_stracks);
+	STrack::multi_predict(strack_pool, this->kalman_filter);
+	vector<vector<float> > dists;
+	int dist_size = 0, dist_size_size = 0;
+	dists = iou_distance(strack_pool, detections, dist_size, dist_size_size);
+	vector<vector<int> > matches;
+	vector<int> u_track, u_detection;
+	linear_assignment(dists, dist_size, dist_size_size, match_thresh, matches, u_track, u_detection);
+	for (int i = 0; i < matches.size(); i++)
+	{
+		STrack *track = strack_pool[matches[i][0]];
+		STrack *det = &detections[matches[i][1]];
+		if (track->state == TrackState::Tracked)
+		{
+			track->update(*det, this->frame_id);
+			activated_stracks.push_back(*track);
+		}
+		else
+		{
+			track->re_activate(*det, this->frame_id, false);
+			refind_stracks.push_back(*track);
+		}
+	}
+	////////////////// Step 3: Second association, using low score dets //////////////////
+	for (int i = 0; i < u_detection.size(); i++)
+	{
+		detections_cp.push_back(detections[u_detection[i]]);
+	}
+	detections.clear();
+	detections.assign(detections_low.begin(), detections_low.end());
+	for (int i = 0; i < u_track.size(); i++)
+	{
+		if (strack_pool[u_track[i]]->state == TrackState::Tracked)
+		{
+			r_tracked_stracks.push_back(strack_pool[u_track[i]]);
+		}
+	}
+	dists.clear();
+	dists = iou_distance(r_tracked_stracks, detections, dist_size, dist_size_size);
+	matches.clear();
+	u_track.clear();
+	u_detection.clear();
+	linear_assignment(dists, dist_size, dist_size_size, 0.5, matches, u_track, u_detection);
+	for (int i = 0; i < matches.size(); i++)
+	{
+		STrack *track = r_tracked_stracks[matches[i][0]];
+		STrack *det = &detections[matches[i][1]];
+		if (track->state == TrackState::Tracked)
+		{
+			track->update(*det, this->frame_id);
+			activated_stracks.push_back(*track);
+		}
+		else
+		{
+			track->re_activate(*det, this->frame_id, false);
+			refind_stracks.push_back(*track);
+		}
+	}
+	for (int i = 0; i < u_track.size(); i++)
+	{
+		STrack *track = r_tracked_stracks[u_track[i]];
+		if (track->state != TrackState::Lost)
+		{
+			track->mark_lost();
+			lost_stracks.push_back(*track);
+		}
+	}
+	// Deal with unconfirmed tracks, usually tracks with only one beginning frame
+	detections.clear();
+	detections.assign(detections_cp.begin(), detections_cp.end());
+	dists.clear();
+	dists = iou_distance(unconfirmed, detections, dist_size, dist_size_size);
+	matches.clear();
+	vector<int> u_unconfirmed;
+	u_detection.clear();
+	linear_assignment(dists, dist_size, dist_size_size, 0.7, matches, u_unconfirmed, u_detection);
+	for (int i = 0; i < matches.size(); i++)
+	{
+		unconfirmed[matches[i][0]]->update(detections[matches[i][1]], this->frame_id);
+		activated_stracks.push_back(*unconfirmed[matches[i][0]]);
+	}
+	for (int i = 0; i < u_unconfirmed.size(); i++)
+	{
+		STrack *track = unconfirmed[u_unconfirmed[i]];
+		track->mark_removed();
+		removed_stracks.push_back(*track);
+	}
+	////////////////// Step 4: Init new stracks //////////////////
+	for (int i = 0; i < u_detection.size(); i++)
+	{
+		STrack *track = &detections[u_detection[i]];
+		if (track->score < this->high_thresh)
+			continue;
+		track->activate(this->kalman_filter, this->frame_id);
+		activated_stracks.push_back(*track);
+	}
+	////////////////// Step 5: Update state //////////////////
+	for (int i = 0; i < this->lost_stracks.size(); i++)
+	{
+		if (this->frame_id - this->lost_stracks[i].end_frame() > this->max_time_lost)
+		{
+			this->lost_stracks[i].mark_removed();
+			removed_stracks.push_back(this->lost_stracks[i]);
+		}
+	}
+	for (int i = 0; i < this->tracked_stracks.size(); i++)
+	{
+		if (this->tracked_stracks[i].state == TrackState::Tracked)
+		{
+			tracked_stracks_swap.push_back(this->tracked_stracks[i]);
+		}
+	}
+	this->tracked_stracks.clear();
+	this->tracked_stracks.assign(tracked_stracks_swap.begin(), tracked_stracks_swap.end());
+	this->tracked_stracks = joint_stracks(this->tracked_stracks, activated_stracks);
+	this->tracked_stracks = joint_stracks(this->tracked_stracks, refind_stracks);
+	//std::cout << activated_stracks.size() << std::endl;
+	this->lost_stracks = sub_stracks(this->lost_stracks, this->tracked_stracks);
+	for (int i = 0; i < lost_stracks.size(); i++)
+	{
+		this->lost_stracks.push_back(lost_stracks[i]);
+	}
+	this->lost_stracks = sub_stracks(this->lost_stracks, this->removed_stracks);
+	for (int i = 0; i < removed_stracks.size(); i++)
+	{
+		this->removed_stracks.push_back(removed_stracks[i]);
+	}
+	remove_duplicate_stracks(resa, resb, this->tracked_stracks, this->lost_stracks);
+	this->tracked_stracks.clear();
+	this->tracked_stracks.assign(resa.begin(), resa.end());
+	this->lost_stracks.clear();
+	this->lost_stracks.assign(resb.begin(), resb.end());
+	for (int i = 0; i < this->tracked_stracks.size(); i++)
+	{
+		if (this->tracked_stracks[i].is_activated)
+		{
+			output_stracks.push_back(this->tracked_stracks[i]);
+		}
+	}
+	return output_stracks;
+}

deploy/ncnn/cpp/src/STrack.cpp ADDED Viewed

	@@ -0,0 +1,192 @@

+#include "STrack.h"
+STrack::STrack(vector<float> tlwh_, float score)
+{
+	_tlwh.resize(4);
+	_tlwh.assign(tlwh_.begin(), tlwh_.end());
+	is_activated = false;
+	track_id = 0;
+	state = TrackState::New;
+	tlwh.resize(4);
+	tlbr.resize(4);
+	static_tlwh();
+	static_tlbr();
+	frame_id = 0;
+	tracklet_len = 0;
+	this->score = score;
+	start_frame = 0;
+}
+STrack::~STrack()
+{
+}
+void STrack::activate(byte_kalman::KalmanFilter &kalman_filter, int frame_id)
+{
+	this->kalman_filter = kalman_filter;
+	this->track_id = this->next_id();
+	vector<float> _tlwh_tmp(4);
+	_tlwh_tmp[0] = this->_tlwh[0];
+	_tlwh_tmp[1] = this->_tlwh[1];
+	_tlwh_tmp[2] = this->_tlwh[2];
+	_tlwh_tmp[3] = this->_tlwh[3];
+	vector<float> xyah = tlwh_to_xyah(_tlwh_tmp);
+	DETECTBOX xyah_box;
+	xyah_box[0] = xyah[0];
+	xyah_box[1] = xyah[1];
+	xyah_box[2] = xyah[2];
+	xyah_box[3] = xyah[3];
+	auto mc = this->kalman_filter.initiate(xyah_box);
+	this->mean = mc.first;
+	this->covariance = mc.second;
+	static_tlwh();
+	static_tlbr();
+	this->tracklet_len = 0;
+	this->state = TrackState::Tracked;
+	if (frame_id == 1)
+	{
+		this->is_activated = true;
+	}
+	//this->is_activated = true;
+	this->frame_id = frame_id;
+	this->start_frame = frame_id;
+}
+void STrack::re_activate(STrack &new_track, int frame_id, bool new_id)
+{
+	vector<float> xyah = tlwh_to_xyah(new_track.tlwh);
+	DETECTBOX xyah_box;
+	xyah_box[0] = xyah[0];
+	xyah_box[1] = xyah[1];
+	xyah_box[2] = xyah[2];
+	xyah_box[3] = xyah[3];
+	auto mc = this->kalman_filter.update(this->mean, this->covariance, xyah_box);
+	this->mean = mc.first;
+	this->covariance = mc.second;
+	static_tlwh();
+	static_tlbr();
+	this->tracklet_len = 0;
+	this->state = TrackState::Tracked;
+	this->is_activated = true;
+	this->frame_id = frame_id;
+	this->score = new_track.score;
+	if (new_id)
+		this->track_id = next_id();
+}
+void STrack::update(STrack &new_track, int frame_id)
+{
+	this->frame_id = frame_id;
+	this->tracklet_len++;
+	vector<float> xyah = tlwh_to_xyah(new_track.tlwh);
+	DETECTBOX xyah_box;
+	xyah_box[0] = xyah[0];
+	xyah_box[1] = xyah[1];
+	xyah_box[2] = xyah[2];
+	xyah_box[3] = xyah[3];
+	auto mc = this->kalman_filter.update(this->mean, this->covariance, xyah_box);
+	this->mean = mc.first;
+	this->covariance = mc.second;
+	static_tlwh();
+	static_tlbr();
+	this->state = TrackState::Tracked;
+	this->is_activated = true;
+	this->score = new_track.score;
+}
+void STrack::static_tlwh()
+{
+	if (this->state == TrackState::New)
+	{
+		tlwh[0] = _tlwh[0];
+		tlwh[1] = _tlwh[1];
+		tlwh[2] = _tlwh[2];
+		tlwh[3] = _tlwh[3];
+		return;
+	}
+	tlwh[0] = mean[0];
+	tlwh[1] = mean[1];
+	tlwh[2] = mean[2];
+	tlwh[3] = mean[3];
+	tlwh[2] *= tlwh[3];
+	tlwh[0] -= tlwh[2] / 2;
+	tlwh[1] -= tlwh[3] / 2;
+}
+void STrack::static_tlbr()
+{
+	tlbr.clear();
+	tlbr.assign(tlwh.begin(), tlwh.end());
+	tlbr[2] += tlbr[0];
+	tlbr[3] += tlbr[1];
+}
+vector<float> STrack::tlwh_to_xyah(vector<float> tlwh_tmp)
+{
+	vector<float> tlwh_output = tlwh_tmp;
+	tlwh_output[0] += tlwh_output[2] / 2;
+	tlwh_output[1] += tlwh_output[3] / 2;
+	tlwh_output[2] /= tlwh_output[3];
+	return tlwh_output;
+}
+vector<float> STrack::to_xyah()
+{
+	return tlwh_to_xyah(tlwh);
+}
+vector<float> STrack::tlbr_to_tlwh(vector<float> &tlbr)
+{
+	tlbr[2] -= tlbr[0];
+	tlbr[3] -= tlbr[1];
+	return tlbr;
+}
+void STrack::mark_lost()
+{
+	state = TrackState::Lost;
+}
+void STrack::mark_removed()
+{
+	state = TrackState::Removed;
+}
+int STrack::next_id()
+{
+	static int _count = 0;
+	_count++;
+	return _count;
+}
+int STrack::end_frame()
+{
+	return this->frame_id;
+}
+void STrack::multi_predict(vector<STrack*> &stracks, byte_kalman::KalmanFilter &kalman_filter)
+{
+	for (int i = 0; i < stracks.size(); i++)
+	{
+		if (stracks[i]->state != TrackState::Tracked)
+		{
+			stracks[i]->mean[7] = 0;
+		}
+		kalman_filter.predict(stracks[i]->mean, stracks[i]->covariance);
+	}
+}

deploy/ncnn/cpp/src/bytetrack.cpp ADDED Viewed

	@@ -0,0 +1,396 @@

+#include "layer.h"
+#include "net.h"
+#if defined(USE_NCNN_SIMPLEOCV)
+#include "simpleocv.h"
+#include <opencv2/opencv.hpp>
+#else
+#include <opencv2/core/core.hpp>
+#include <opencv2/highgui/highgui.hpp>
+#include <opencv2/imgproc/imgproc.hpp>
+#include <opencv2/opencv.hpp>
+#endif
+#include <float.h>
+#include <stdio.h>
+#include <vector>
+#include <chrono>
+#include "BYTETracker.h"
+#define YOLOX_NMS_THRESH  0.7 // nms threshold
+#define YOLOX_CONF_THRESH 0.1 // threshold of bounding box prob
+#define INPUT_W 1088  // target image size w after resize
+#define INPUT_H 608   // target image size h after resize
+Mat static_resize(Mat& img) {
+    float r = min(INPUT_W / (img.cols*1.0), INPUT_H / (img.rows*1.0));
+    // r = std::min(r, 1.0f);
+    int unpad_w = r * img.cols;
+    int unpad_h = r * img.rows;
+    Mat re(unpad_h, unpad_w, CV_8UC3);
+    resize(img, re, re.size());
+    Mat out(INPUT_H, INPUT_W, CV_8UC3, Scalar(114, 114, 114));
+    re.copyTo(out(Rect(0, 0, re.cols, re.rows)));
+    return out;
+}
+// YOLOX use the same focus in yolov5
+class YoloV5Focus : public ncnn::Layer
+{
+public:
+    YoloV5Focus()
+    {
+        one_blob_only = true;
+    }
+    virtual int forward(const ncnn::Mat& bottom_blob, ncnn::Mat& top_blob, const ncnn::Option& opt) const
+    {
+        int w = bottom_blob.w;
+        int h = bottom_blob.h;
+        int channels = bottom_blob.c;
+        int outw = w / 2;
+        int outh = h / 2;
+        int outc = channels * 4;
+        top_blob.create(outw, outh, outc, 4u, 1, opt.blob_allocator);
+        if (top_blob.empty())
+            return -100;
+        #pragma omp parallel for num_threads(opt.num_threads)
+        for (int p = 0; p < outc; p++)
+        {
+            const float* ptr = bottom_blob.channel(p % channels).row((p / channels) % 2) + ((p / channels) / 2);
+            float* outptr = top_blob.channel(p);
+            for (int i = 0; i < outh; i++)
+            {
+                for (int j = 0; j < outw; j++)
+                {
+                    *outptr = *ptr;
+                    outptr += 1;
+                    ptr += 2;
+                }
+                ptr += w;
+            }
+        }
+        return 0;
+    }
+};
+DEFINE_LAYER_CREATOR(YoloV5Focus)
+struct GridAndStride
+{
+    int grid0;
+    int grid1;
+    int stride;
+};
+static inline float intersection_area(const Object& a, const Object& b)
+{
+    cv::Rect_<float> inter = a.rect & b.rect;
+    return inter.area();
+}
+static void qsort_descent_inplace(std::vector<Object>& faceobjects, int left, int right)
+{
+    int i = left;
+    int j = right;
+    float p = faceobjects[(left + right) / 2].prob;
+    while (i <= j)
+    {
+        while (faceobjects[i].prob > p)
+            i++;
+        while (faceobjects[j].prob < p)
+            j--;
+        if (i <= j)
+        {
+            // swap
+            std::swap(faceobjects[i], faceobjects[j]);
+            i++;
+            j--;
+        }
+    }
+    #pragma omp parallel sections
+    {
+        #pragma omp section
+        {
+            if (left < j) qsort_descent_inplace(faceobjects, left, j);
+        }
+        #pragma omp section
+        {
+            if (i < right) qsort_descent_inplace(faceobjects, i, right);
+        }
+    }
+}
+static void qsort_descent_inplace(std::vector<Object>& objects)
+{
+    if (objects.empty())
+        return;
+    qsort_descent_inplace(objects, 0, objects.size() - 1);
+}
+static void nms_sorted_bboxes(const std::vector<Object>& faceobjects, std::vector<int>& picked, float nms_threshold)
+{
+    picked.clear();
+    const int n = faceobjects.size();
+    std::vector<float> areas(n);
+    for (int i = 0; i < n; i++)
+    {
+        areas[i] = faceobjects[i].rect.area();
+    }
+    for (int i = 0; i < n; i++)
+    {
+        const Object& a = faceobjects[i];
+        int keep = 1;
+        for (int j = 0; j < (int)picked.size(); j++)
+        {
+            const Object& b = faceobjects[picked[j]];
+            // intersection over union
+            float inter_area = intersection_area(a, b);
+            float union_area = areas[i] + areas[picked[j]] - inter_area;
+            // float IoU = inter_area / union_area
+            if (inter_area / union_area > nms_threshold)
+                keep = 0;
+        }
+        if (keep)
+            picked.push_back(i);
+    }
+}
+static void generate_grids_and_stride(const int target_w, const int target_h, std::vector<int>& strides, std::vector<GridAndStride>& grid_strides)
+{
+    for (int i = 0; i < (int)strides.size(); i++)
+    {
+        int stride = strides[i];
+        int num_grid_w = target_w / stride;
+        int num_grid_h = target_h / stride;
+        for (int g1 = 0; g1 < num_grid_h; g1++)
+        {
+            for (int g0 = 0; g0 < num_grid_w; g0++)
+            {
+                GridAndStride gs;
+                gs.grid0 = g0;
+                gs.grid1 = g1;
+                gs.stride = stride;
+                grid_strides.push_back(gs);
+            }
+        }
+    }
+}
+static void generate_yolox_proposals(std::vector<GridAndStride> grid_strides, const ncnn::Mat& feat_blob, float prob_threshold, std::vector<Object>& objects)
+{
+    const int num_grid = feat_blob.h;
+    const int num_class = feat_blob.w - 5;
+    const int num_anchors = grid_strides.size();
+    const float* feat_ptr = feat_blob.channel(0);
+    for (int anchor_idx = 0; anchor_idx < num_anchors; anchor_idx++)
+    {
+        const int grid0 = grid_strides[anchor_idx].grid0;
+        const int grid1 = grid_strides[anchor_idx].grid1;
+        const int stride = grid_strides[anchor_idx].stride;
+        // yolox/models/yolo_head.py decode logic
+        //  outputs[..., :2] = (outputs[..., :2] + grids) * strides
+        //  outputs[..., 2:4] = torch.exp(outputs[..., 2:4]) * strides
+        float x_center = (feat_ptr[0] + grid0) * stride;
+        float y_center = (feat_ptr[1] + grid1) * stride;
+        float w = exp(feat_ptr[2]) * stride;
+        float h = exp(feat_ptr[3]) * stride;
+        float x0 = x_center - w * 0.5f;
+        float y0 = y_center - h * 0.5f;
+        float box_objectness = feat_ptr[4];
+        for (int class_idx = 0; class_idx < num_class; class_idx++)
+        {
+            float box_cls_score = feat_ptr[5 + class_idx];
+            float box_prob = box_objectness * box_cls_score;
+            if (box_prob > prob_threshold)
+            {
+                Object obj;
+                obj.rect.x = x0;
+                obj.rect.y = y0;
+                obj.rect.width = w;
+                obj.rect.height = h;
+                obj.label = class_idx;
+                obj.prob = box_prob;
+                objects.push_back(obj);
+            }
+        } // class loop
+        feat_ptr += feat_blob.w;
+    } // point anchor loop
+}
+static int detect_yolox(ncnn::Mat& in_pad, std::vector<Object>& objects, ncnn::Extractor ex, float scale)
+{
+    ex.input("images", in_pad);
+    std::vector<Object> proposals;
+    {
+        ncnn::Mat out;
+        ex.extract("output", out);
+        static const int stride_arr[] = {8, 16, 32}; // might have stride=64 in YOLOX
+        std::vector<int> strides(stride_arr, stride_arr + sizeof(stride_arr) / sizeof(stride_arr[0]));
+        std::vector<GridAndStride> grid_strides;
+        generate_grids_and_stride(INPUT_W, INPUT_H, strides, grid_strides);
+        generate_yolox_proposals(grid_strides, out, YOLOX_CONF_THRESH, proposals);
+    }
+    // sort all proposals by score from highest to lowest
+    qsort_descent_inplace(proposals);
+    // apply nms with nms_threshold
+    std::vector<int> picked;
+    nms_sorted_bboxes(proposals, picked, YOLOX_NMS_THRESH);
+    int count = picked.size();
+    objects.resize(count);
+    for (int i = 0; i < count; i++)
+    {
+        objects[i] = proposals[picked[i]];
+        // adjust offset to original unpadded
+        float x0 = (objects[i].rect.x) / scale;
+        float y0 = (objects[i].rect.y) / scale;
+        float x1 = (objects[i].rect.x + objects[i].rect.width) / scale;
+        float y1 = (objects[i].rect.y + objects[i].rect.height) / scale;
+        // clip
+        // x0 = std::max(std::min(x0, (float)(img_w - 1)), 0.f);
+        // y0 = std::max(std::min(y0, (float)(img_h - 1)), 0.f);
+        // x1 = std::max(std::min(x1, (float)(img_w - 1)), 0.f);
+        // y1 = std::max(std::min(y1, (float)(img_h - 1)), 0.f);
+        objects[i].rect.x = x0;
+        objects[i].rect.y = y0;
+        objects[i].rect.width = x1 - x0;
+        objects[i].rect.height = y1 - y0;
+    }
+    return 0;
+}
+int main(int argc, char** argv)
+{
+    if (argc != 2)
+    {
+        fprintf(stderr, "Usage: %s [videopath]\n", argv[0]);
+        return -1;
+    }
+    ncnn::Net yolox;
+    //yolox.opt.use_vulkan_compute = true;
+    //yolox.opt.use_bf16_storage = true;
+    yolox.opt.num_threads = 20;
+    //ncnn::set_cpu_powersave(0);
+    //ncnn::set_omp_dynamic(0);
+    //ncnn::set_omp_num_threads(20);
+    // Focus in yolov5
+    yolox.register_custom_layer("YoloV5Focus", YoloV5Focus_layer_creator);
+    yolox.load_param("bytetrack_s_op.param");
+    yolox.load_model("bytetrack_s_op.bin");
+    ncnn::Extractor ex = yolox.create_extractor();
+    const char* videopath = argv[1];
+    VideoCapture cap(videopath);
+	if (!cap.isOpened())
+		return 0;
+	int img_w = cap.get(CV_CAP_PROP_FRAME_WIDTH);
+	int img_h = cap.get(CV_CAP_PROP_FRAME_HEIGHT);
+    int fps = cap.get(CV_CAP_PROP_FPS);
+    long nFrame = static_cast<long>(cap.get(CV_CAP_PROP_FRAME_COUNT));
+    cout << "Total frames: " << nFrame << endl;
+    VideoWriter writer("demo.mp4", CV_FOURCC('m', 'p', '4', 'v'), fps, Size(img_w, img_h));
+    Mat img;
+    BYTETracker tracker(fps, 30);
+    int num_frames = 0;
+    int total_ms = 1;
+	for (;;)
+    {
+        if(!cap.read(img))
+            break;
+        num_frames ++;
+        if (num_frames % 20 == 0)
+        {
+            cout << "Processing frame " << num_frames << " (" << num_frames * 1000000 / total_ms << " fps)" << endl;
+        }
+		if (img.empty())
+			break;
+        float scale = min(INPUT_W / (img.cols*1.0), INPUT_H / (img.rows*1.0));
+        Mat pr_img = static_resize(img);
+        ncnn::Mat in_pad = ncnn::Mat::from_pixels_resize(pr_img.data, ncnn::Mat::PIXEL_BGR2RGB, INPUT_W, INPUT_H, INPUT_W, INPUT_H);
+        // python 0-1 input tensor with rgb_means = (0.485, 0.456, 0.406), std = (0.229, 0.224, 0.225)
+        // so for 0-255 input image, rgb_mean should multiply 255 and norm should div by std.
+        const float mean_vals[3] = {255.f * 0.485f, 255.f * 0.456, 255.f * 0.406f};
+        const float norm_vals[3] = {1 / (255.f * 0.229f), 1 / (255.f * 0.224f), 1 / (255.f * 0.225f)};
+        in_pad.substract_mean_normalize(mean_vals, norm_vals);
+        std::vector<Object> objects;
+        auto start = chrono::system_clock::now();
+        //detect_yolox(img, objects);
+        detect_yolox(in_pad, objects, ex, scale);
+        vector<STrack> output_stracks = tracker.update(objects);
+        auto end = chrono::system_clock::now();
+        total_ms = total_ms + chrono::duration_cast<chrono::microseconds>(end - start).count();
+        for (int i = 0; i < output_stracks.size(); i++)
+		{
+			vector<float> tlwh = output_stracks[i].tlwh;
+			bool vertical = tlwh[2] / tlwh[3] > 1.6;
+			if (tlwh[2] * tlwh[3] > 20 && !vertical)
+			{
+				Scalar s = tracker.get_color(output_stracks[i].track_id);
+				putText(img, format("%d", output_stracks[i].track_id), Point(tlwh[0], tlwh[1] - 5),
+                        0, 0.6, Scalar(0, 0, 255), 2, LINE_AA);
+                rectangle(img, Rect(tlwh[0], tlwh[1], tlwh[2], tlwh[3]), s, 2);
+			}
+		}
+        putText(img, format("frame: %d fps: %d num: %d", num_frames, num_frames * 1000000 / total_ms, output_stracks.size()),
+                Point(0, 30), 0, 0.6, Scalar(0, 0, 255), 2, LINE_AA);
+        writer.write(img);
+        char c = waitKey(1);
+        if (c > 0)
+        {
+            break;
+        }
+    }
+    cap.release();
+    cout << "FPS: " << num_frames * 1000000 / total_ms << endl;
+    return 0;
+}

deploy/ncnn/cpp/src/kalmanFilter.cpp ADDED Viewed

	@@ -0,0 +1,152 @@

+#include "kalmanFilter.h"
+#include <Eigen/Cholesky>
+namespace byte_kalman
+{
+	const double KalmanFilter::chi2inv95[10] = {
+	0,
+	3.8415,
+	5.9915,
+	7.8147,
+	9.4877,
+	11.070,
+	12.592,
+	14.067,
+	15.507,
+	16.919
+	};
+	KalmanFilter::KalmanFilter()
+	{
+		int ndim = 4;
+		double dt = 1.;
+		_motion_mat = Eigen::MatrixXf::Identity(8, 8);
+		for (int i = 0; i < ndim; i++) {
+			_motion_mat(i, ndim + i) = dt;
+		}
+		_update_mat = Eigen::MatrixXf::Identity(4, 8);
+		this->_std_weight_position = 1. / 20;
+		this->_std_weight_velocity = 1. / 160;
+	}
+	KAL_DATA KalmanFilter::initiate(const DETECTBOX &measurement)
+	{
+		DETECTBOX mean_pos = measurement;
+		DETECTBOX mean_vel;
+		for (int i = 0; i < 4; i++) mean_vel(i) = 0;
+		KAL_MEAN mean;
+		for (int i = 0; i < 8; i++) {
+			if (i < 4) mean(i) = mean_pos(i);
+			else mean(i) = mean_vel(i - 4);
+		}
+		KAL_MEAN std;
+		std(0) = 2 * _std_weight_position * measurement[3];
+		std(1) = 2 * _std_weight_position * measurement[3];
+		std(2) = 1e-2;
+		std(3) = 2 * _std_weight_position * measurement[3];
+		std(4) = 10 * _std_weight_velocity * measurement[3];
+		std(5) = 10 * _std_weight_velocity * measurement[3];
+		std(6) = 1e-5;
+		std(7) = 10 * _std_weight_velocity * measurement[3];
+		KAL_MEAN tmp = std.array().square();
+		KAL_COVA var = tmp.asDiagonal();
+		return std::make_pair(mean, var);
+	}
+	void KalmanFilter::predict(KAL_MEAN &mean, KAL_COVA &covariance)
+	{
+		//revise the data;
+		DETECTBOX std_pos;
+		std_pos << _std_weight_position * mean(3),
+			_std_weight_position * mean(3),
+			1e-2,
+			_std_weight_position * mean(3);
+		DETECTBOX std_vel;
+		std_vel << _std_weight_velocity * mean(3),
+			_std_weight_velocity * mean(3),
+			1e-5,
+			_std_weight_velocity * mean(3);
+		KAL_MEAN tmp;
+		tmp.block<1, 4>(0, 0) = std_pos;
+		tmp.block<1, 4>(0, 4) = std_vel;
+		tmp = tmp.array().square();
+		KAL_COVA motion_cov = tmp.asDiagonal();
+		KAL_MEAN mean1 = this->_motion_mat * mean.transpose();
+		KAL_COVA covariance1 = this->_motion_mat * covariance *(_motion_mat.transpose());
+		covariance1 += motion_cov;
+		mean = mean1;
+		covariance = covariance1;
+	}
+	KAL_HDATA KalmanFilter::project(const KAL_MEAN &mean, const KAL_COVA &covariance)
+	{
+		DETECTBOX std;
+		std << _std_weight_position * mean(3), _std_weight_position * mean(3),
+			1e-1, _std_weight_position * mean(3);
+		KAL_HMEAN mean1 = _update_mat * mean.transpose();
+		KAL_HCOVA covariance1 = _update_mat * covariance * (_update_mat.transpose());
+		Eigen::Matrix<float, 4, 4> diag = std.asDiagonal();
+		diag = diag.array().square().matrix();
+		covariance1 += diag;
+		//    covariance1.diagonal() << diag;
+		return std::make_pair(mean1, covariance1);
+	}
+	KAL_DATA
+		KalmanFilter::update(
+			const KAL_MEAN &mean,
+			const KAL_COVA &covariance,
+			const DETECTBOX &measurement)
+	{
+		KAL_HDATA pa = project(mean, covariance);
+		KAL_HMEAN projected_mean = pa.first;
+		KAL_HCOVA projected_cov = pa.second;
+		//chol_factor, lower =
+		//scipy.linalg.cho_factor(projected_cov, lower=True, check_finite=False)
+		//kalmain_gain =
+		//scipy.linalg.cho_solve((cho_factor, lower),
+		//np.dot(covariance, self._upadte_mat.T).T,
+		//check_finite=False).T
+		Eigen::Matrix<float, 4, 8> B = (covariance * (_update_mat.transpose())).transpose();
+		Eigen::Matrix<float, 8, 4> kalman_gain = (projected_cov.llt().solve(B)).transpose(); // eg.8x4
+		Eigen::Matrix<float, 1, 4> innovation = measurement - projected_mean; //eg.1x4
+		auto tmp = innovation * (kalman_gain.transpose());
+		KAL_MEAN new_mean = (mean.array() + tmp.array()).matrix();
+		KAL_COVA new_covariance = covariance - kalman_gain * projected_cov*(kalman_gain.transpose());
+		return std::make_pair(new_mean, new_covariance);
+	}
+	Eigen::Matrix<float, 1, -1>
+		KalmanFilter::gating_distance(
+			const KAL_MEAN &mean,
+			const KAL_COVA &covariance,
+			const std::vector<DETECTBOX> &measurements,
+			bool only_position)
+	{
+		KAL_HDATA pa = this->project(mean, covariance);
+		if (only_position) {
+			printf("not implement!");
+			exit(0);
+		}
+		KAL_HMEAN mean1 = pa.first;
+		KAL_HCOVA covariance1 = pa.second;
+		//    Eigen::Matrix<float, -1, 4, Eigen::RowMajor> d(size, 4);
+		DETECTBOXSS d(measurements.size(), 4);
+		int pos = 0;
+		for (DETECTBOX box : measurements) {
+			d.row(pos++) = box - mean1;
+		}
+		Eigen::Matrix<float, -1, -1, Eigen::RowMajor> factor = covariance1.llt().matrixL();
+		Eigen::Matrix<float, -1, -1> z = factor.triangularView<Eigen::Lower>().solve<Eigen::OnTheRight>(d).transpose();
+		auto zz = ((z.array())*(z.array())).matrix();
+		auto square_maha = zz.colwise().sum();
+		return square_maha;
+	}
+}

deploy/ncnn/cpp/src/lapjv.cpp ADDED Viewed

	@@ -0,0 +1,343 @@

+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include "lapjv.h"
+/** Column-reduction and reduction transfer for a dense cost matrix.
+ */
+int_t _ccrrt_dense(const uint_t n, cost_t *cost[],
+	int_t *free_rows, int_t *x, int_t *y, cost_t *v)
+{
+	int_t n_free_rows;
+	boolean *unique;
+	for (uint_t i = 0; i < n; i++) {
+		x[i] = -1;
+		v[i] = LARGE;
+		y[i] = 0;
+	}
+	for (uint_t i = 0; i < n; i++) {
+		for (uint_t j = 0; j < n; j++) {
+			const cost_t c = cost[i][j];
+			if (c < v[j]) {
+				v[j] = c;
+				y[j] = i;
+			}
+			PRINTF("i=%d, j=%d, c[i,j]=%f, v[j]=%f y[j]=%d\n", i, j, c, v[j], y[j]);
+		}
+	}
+	PRINT_COST_ARRAY(v, n);
+	PRINT_INDEX_ARRAY(y, n);
+	NEW(unique, boolean, n);
+	memset(unique, TRUE, n);
+	{
+		int_t j = n;
+		do {
+			j--;
+			const int_t i = y[j];
+			if (x[i] < 0) {
+				x[i] = j;
+			}
+			else {
+				unique[i] = FALSE;
+				y[j] = -1;
+			}
+		} while (j > 0);
+	}
+	n_free_rows = 0;
+	for (uint_t i = 0; i < n; i++) {
+		if (x[i] < 0) {
+			free_rows[n_free_rows++] = i;
+		}
+		else if (unique[i]) {
+			const int_t j = x[i];
+			cost_t min = LARGE;
+			for (uint_t j2 = 0; j2 < n; j2++) {
+				if (j2 == (uint_t)j) {
+					continue;
+				}
+				const cost_t c = cost[i][j2] - v[j2];
+				if (c < min) {
+					min = c;
+				}
+			}
+			PRINTF("v[%d] = %f - %f\n", j, v[j], min);
+			v[j] -= min;
+		}
+	}
+	FREE(unique);
+	return n_free_rows;
+}
+/** Augmenting row reduction for a dense cost matrix.
+ */
+int_t _carr_dense(
+	const uint_t n, cost_t *cost[],
+	const uint_t n_free_rows,
+	int_t *free_rows, int_t *x, int_t *y, cost_t *v)
+{
+	uint_t current = 0;
+	int_t new_free_rows = 0;
+	uint_t rr_cnt = 0;
+	PRINT_INDEX_ARRAY(x, n);
+	PRINT_INDEX_ARRAY(y, n);
+	PRINT_COST_ARRAY(v, n);
+	PRINT_INDEX_ARRAY(free_rows, n_free_rows);
+	while (current < n_free_rows) {
+		int_t i0;
+		int_t j1, j2;
+		cost_t v1, v2, v1_new;
+		boolean v1_lowers;
+		rr_cnt++;
+		PRINTF("current = %d rr_cnt = %d\n", current, rr_cnt);
+		const int_t free_i = free_rows[current++];
+		j1 = 0;
+		v1 = cost[free_i][0] - v[0];
+		j2 = -1;
+		v2 = LARGE;
+		for (uint_t j = 1; j < n; j++) {
+			PRINTF("%d = %f %d = %f\n", j1, v1, j2, v2);
+			const cost_t c = cost[free_i][j] - v[j];
+			if (c < v2) {
+				if (c >= v1) {
+					v2 = c;
+					j2 = j;
+				}
+				else {
+					v2 = v1;
+					v1 = c;
+					j2 = j1;
+					j1 = j;
+				}
+			}
+		}
+		i0 = y[j1];
+		v1_new = v[j1] - (v2 - v1);
+		v1_lowers = v1_new < v[j1];
+		PRINTF("%d %d 1=%d,%f 2=%d,%f v1'=%f(%d,%g) \n", free_i, i0, j1, v1, j2, v2, v1_new, v1_lowers, v[j1] - v1_new);
+		if (rr_cnt < current * n) {
+			if (v1_lowers) {
+				v[j1] = v1_new;
+			}
+			else if (i0 >= 0 && j2 >= 0) {
+				j1 = j2;
+				i0 = y[j2];
+			}
+			if (i0 >= 0) {
+				if (v1_lowers) {
+					free_rows[--current] = i0;
+				}
+				else {
+					free_rows[new_free_rows++] = i0;
+				}
+			}
+		}
+		else {
+			PRINTF("rr_cnt=%d >= %d (current=%d * n=%d)\n", rr_cnt, current * n, current, n);
+			if (i0 >= 0) {
+				free_rows[new_free_rows++] = i0;
+			}
+		}
+		x[free_i] = j1;
+		y[j1] = free_i;
+	}
+	return new_free_rows;
+}
+/** Find columns with minimum d[j] and put them on the SCAN list.
+ */
+uint_t _find_dense(const uint_t n, uint_t lo, cost_t *d, int_t *cols, int_t *y)
+{
+	uint_t hi = lo + 1;
+	cost_t mind = d[cols[lo]];
+	for (uint_t k = hi; k < n; k++) {
+		int_t j = cols[k];
+		if (d[j] <= mind) {
+			if (d[j] < mind) {
+				hi = lo;
+				mind = d[j];
+			}
+			cols[k] = cols[hi];
+			cols[hi++] = j;
+		}
+	}
+	return hi;
+}
+// Scan all columns in TODO starting from arbitrary column in SCAN
+// and try to decrease d of the TODO columns using the SCAN column.
+int_t _scan_dense(const uint_t n, cost_t *cost[],
+	uint_t *plo, uint_t*phi,
+	cost_t *d, int_t *cols, int_t *pred,
+	int_t *y, cost_t *v)
+{
+	uint_t lo = *plo;
+	uint_t hi = *phi;
+	cost_t h, cred_ij;
+	while (lo != hi) {
+		int_t j = cols[lo++];
+		const int_t i = y[j];
+		const cost_t mind = d[j];
+		h = cost[i][j] - v[j] - mind;
+		PRINTF("i=%d j=%d h=%f\n", i, j, h);
+		// For all columns in TODO
+		for (uint_t k = hi; k < n; k++) {
+			j = cols[k];
+			cred_ij = cost[i][j] - v[j] - h;
+			if (cred_ij < d[j]) {
+				d[j] = cred_ij;
+				pred[j] = i;
+				if (cred_ij == mind) {
+					if (y[j] < 0) {
+						return j;
+					}
+					cols[k] = cols[hi];
+					cols[hi++] = j;
+				}
+			}
+		}
+	}
+	*plo = lo;
+	*phi = hi;
+	return -1;
+}
+/** Single iteration of modified Dijkstra shortest path algorithm as explained in the JV paper.
+ *
+ * This is a dense matrix version.
+ *
+ * \return The closest free column index.
+ */
+int_t find_path_dense(
+	const uint_t n, cost_t *cost[],
+	const int_t start_i,
+	int_t *y, cost_t *v,
+	int_t *pred)
+{
+	uint_t lo = 0, hi = 0;
+	int_t final_j = -1;
+	uint_t n_ready = 0;
+	int_t *cols;
+	cost_t *d;
+	NEW(cols, int_t, n);
+	NEW(d, cost_t, n);
+	for (uint_t i = 0; i < n; i++) {
+		cols[i] = i;
+		pred[i] = start_i;
+		d[i] = cost[start_i][i] - v[i];
+	}
+	PRINT_COST_ARRAY(d, n);
+	while (final_j == -1) {
+		// No columns left on the SCAN list.
+		if (lo == hi) {
+			PRINTF("%d..%d -> find\n", lo, hi);
+			n_ready = lo;
+			hi = _find_dense(n, lo, d, cols, y);
+			PRINTF("check %d..%d\n", lo, hi);
+			PRINT_INDEX_ARRAY(cols, n);
+			for (uint_t k = lo; k < hi; k++) {
+				const int_t j = cols[k];
+				if (y[j] < 0) {
+					final_j = j;
+				}
+			}
+		}
+		if (final_j == -1) {
+			PRINTF("%d..%d -> scan\n", lo, hi);
+			final_j = _scan_dense(
+				n, cost, &lo, &hi, d, cols, pred, y, v);
+			PRINT_COST_ARRAY(d, n);
+			PRINT_INDEX_ARRAY(cols, n);
+			PRINT_INDEX_ARRAY(pred, n);
+		}
+	}
+	PRINTF("found final_j=%d\n", final_j);
+	PRINT_INDEX_ARRAY(cols, n);
+	{
+		const cost_t mind = d[cols[lo]];
+		for (uint_t k = 0; k < n_ready; k++) {
+			const int_t j = cols[k];
+			v[j] += d[j] - mind;
+		}
+	}
+	FREE(cols);
+	FREE(d);
+	return final_j;
+}
+/** Augment for a dense cost matrix.
+ */
+int_t _ca_dense(
+	const uint_t n, cost_t *cost[],
+	const uint_t n_free_rows,
+	int_t *free_rows, int_t *x, int_t *y, cost_t *v)
+{
+	int_t *pred;
+	NEW(pred, int_t, n);
+	for (int_t *pfree_i = free_rows; pfree_i < free_rows + n_free_rows; pfree_i++) {
+		int_t i = -1, j;
+		uint_t k = 0;
+		PRINTF("looking at free_i=%d\n", *pfree_i);
+		j = find_path_dense(n, cost, *pfree_i, y, v, pred);
+		ASSERT(j >= 0);
+		ASSERT(j < n);
+		while (i != *pfree_i) {
+			PRINTF("augment %d\n", j);
+			PRINT_INDEX_ARRAY(pred, n);
+			i = pred[j];
+			PRINTF("y[%d]=%d -> %d\n", j, y[j], i);
+			y[j] = i;
+			PRINT_INDEX_ARRAY(x, n);
+			SWAP_INDICES(j, x[i]);
+			k++;
+			if (k >= n) {
+				ASSERT(FALSE);
+			}
+		}
+	}
+	FREE(pred);
+	return 0;
+}
+/** Solve dense sparse LAP.
+ */
+int lapjv_internal(
+	const uint_t n, cost_t *cost[],
+	int_t *x, int_t *y)
+{
+	int ret;
+	int_t *free_rows;
+	cost_t *v;
+	NEW(free_rows, int_t, n);
+	NEW(v, cost_t, n);
+	ret = _ccrrt_dense(n, cost, free_rows, x, y, v);
+	int i = 0;
+	while (ret > 0 && i < 2) {
+		ret = _carr_dense(n, cost, ret, free_rows, x, y, v);
+		i++;
+	}
+	if (ret > 0) {
+		ret = _ca_dense(n, cost, ret, free_rows, x, y, v);
+	}
+	FREE(v);
+	FREE(free_rows);
+	return ret;
+}

deploy/ncnn/cpp/src/utils.cpp ADDED Viewed

	@@ -0,0 +1,429 @@

+#include "BYTETracker.h"
+#include "lapjv.h"
+vector<STrack*> BYTETracker::joint_stracks(vector<STrack*> &tlista, vector<STrack> &tlistb)
+{
+	map<int, int> exists;
+	vector<STrack*> res;
+	for (int i = 0; i < tlista.size(); i++)
+	{
+		exists.insert(pair<int, int>(tlista[i]->track_id, 1));
+		res.push_back(tlista[i]);
+	}
+	for (int i = 0; i < tlistb.size(); i++)
+	{
+		int tid = tlistb[i].track_id;
+		if (!exists[tid] || exists.count(tid) == 0)
+		{
+			exists[tid] = 1;
+			res.push_back(&tlistb[i]);
+		}
+	}
+	return res;
+}
+vector<STrack> BYTETracker::joint_stracks(vector<STrack> &tlista, vector<STrack> &tlistb)
+{
+	map<int, int> exists;
+	vector<STrack> res;
+	for (int i = 0; i < tlista.size(); i++)
+	{
+		exists.insert(pair<int, int>(tlista[i].track_id, 1));
+		res.push_back(tlista[i]);
+	}
+	for (int i = 0; i < tlistb.size(); i++)
+	{
+		int tid = tlistb[i].track_id;
+		if (!exists[tid] || exists.count(tid) == 0)
+		{
+			exists[tid] = 1;
+			res.push_back(tlistb[i]);
+		}
+	}
+	return res;
+}
+vector<STrack> BYTETracker::sub_stracks(vector<STrack> &tlista, vector<STrack> &tlistb)
+{
+	map<int, STrack> stracks;
+	for (int i = 0; i < tlista.size(); i++)
+	{
+		stracks.insert(pair<int, STrack>(tlista[i].track_id, tlista[i]));
+	}
+	for (int i = 0; i < tlistb.size(); i++)
+	{
+		int tid = tlistb[i].track_id;
+		if (stracks.count(tid) != 0)
+		{
+			stracks.erase(tid);
+		}
+	}
+	vector<STrack> res;
+	std::map<int, STrack>::iterator  it;
+	for (it = stracks.begin(); it != stracks.end(); ++it)
+	{
+		res.push_back(it->second);
+	}
+	return res;
+}
+void BYTETracker::remove_duplicate_stracks(vector<STrack> &resa, vector<STrack> &resb, vector<STrack> &stracksa, vector<STrack> &stracksb)
+{
+	vector<vector<float> > pdist = iou_distance(stracksa, stracksb);
+	vector<pair<int, int> > pairs;
+	for (int i = 0; i < pdist.size(); i++)
+	{
+		for (int j = 0; j < pdist[i].size(); j++)
+		{
+			if (pdist[i][j] < 0.15)
+			{
+				pairs.push_back(pair<int, int>(i, j));
+			}
+		}
+	}
+	vector<int> dupa, dupb;
+	for (int i = 0; i < pairs.size(); i++)
+	{
+		int timep = stracksa[pairs[i].first].frame_id - stracksa[pairs[i].first].start_frame;
+		int timeq = stracksb[pairs[i].second].frame_id - stracksb[pairs[i].second].start_frame;
+		if (timep > timeq)
+			dupb.push_back(pairs[i].second);
+		else
+			dupa.push_back(pairs[i].first);
+	}
+	for (int i = 0; i < stracksa.size(); i++)
+	{
+		vector<int>::iterator iter = find(dupa.begin(), dupa.end(), i);
+		if (iter == dupa.end())
+		{
+			resa.push_back(stracksa[i]);
+		}
+	}
+	for (int i = 0; i < stracksb.size(); i++)
+	{
+		vector<int>::iterator iter = find(dupb.begin(), dupb.end(), i);
+		if (iter == dupb.end())
+		{
+			resb.push_back(stracksb[i]);
+		}
+	}
+}
+void BYTETracker::linear_assignment(vector<vector<float> > &cost_matrix, int cost_matrix_size, int cost_matrix_size_size, float thresh,
+	vector<vector<int> > &matches, vector<int> &unmatched_a, vector<int> &unmatched_b)
+{
+	if (cost_matrix.size() == 0)
+	{
+		for (int i = 0; i < cost_matrix_size; i++)
+		{
+			unmatched_a.push_back(i);
+		}
+		for (int i = 0; i < cost_matrix_size_size; i++)
+		{
+			unmatched_b.push_back(i);
+		}
+		return;
+	}
+	vector<int> rowsol; vector<int> colsol;
+	float c = lapjv(cost_matrix, rowsol, colsol, true, thresh);
+	for (int i = 0; i < rowsol.size(); i++)
+	{
+		if (rowsol[i] >= 0)
+		{
+			vector<int> match;
+			match.push_back(i);
+			match.push_back(rowsol[i]);
+			matches.push_back(match);
+		}
+		else
+		{
+			unmatched_a.push_back(i);
+		}
+	}
+	for (int i = 0; i < colsol.size(); i++)
+	{
+		if (colsol[i] < 0)
+		{
+			unmatched_b.push_back(i);
+		}
+	}
+}
+vector<vector<float> > BYTETracker::ious(vector<vector<float> > &atlbrs, vector<vector<float> > &btlbrs)
+{
+	vector<vector<float> > ious;
+	if (atlbrs.size()*btlbrs.size() == 0)
+		return ious;
+	ious.resize(atlbrs.size());
+	for (int i = 0; i < ious.size(); i++)
+	{
+		ious[i].resize(btlbrs.size());
+	}
+	//bbox_ious
+	for (int k = 0; k < btlbrs.size(); k++)
+	{
+		vector<float> ious_tmp;
+		float box_area = (btlbrs[k][2] - btlbrs[k][0] + 1)*(btlbrs[k][3] - btlbrs[k][1] + 1);
+		for (int n = 0; n < atlbrs.size(); n++)
+		{
+			float iw = min(atlbrs[n][2], btlbrs[k][2]) - max(atlbrs[n][0], btlbrs[k][0]) + 1;
+			if (iw > 0)
+			{
+				float ih = min(atlbrs[n][3], btlbrs[k][3]) - max(atlbrs[n][1], btlbrs[k][1]) + 1;
+				if(ih > 0)
+				{
+					float ua = (atlbrs[n][2] - atlbrs[n][0] + 1)*(atlbrs[n][3] - atlbrs[n][1] + 1) + box_area - iw * ih;
+					ious[n][k] = iw * ih / ua;
+				}
+				else
+				{
+					ious[n][k] = 0.0;
+				}
+			}
+			else
+			{
+				ious[n][k] = 0.0;
+			}
+		}
+	}
+	return ious;
+}
+vector<vector<float> > BYTETracker::iou_distance(vector<STrack*> &atracks, vector<STrack> &btracks, int &dist_size, int &dist_size_size)
+{
+	vector<vector<float> > cost_matrix;
+	if (atracks.size() * btracks.size() == 0)
+	{
+		dist_size = atracks.size();
+		dist_size_size = btracks.size();
+		return cost_matrix;
+	}
+	vector<vector<float> > atlbrs, btlbrs;
+	for (int i = 0; i < atracks.size(); i++)
+	{
+		atlbrs.push_back(atracks[i]->tlbr);
+	}
+	for (int i = 0; i < btracks.size(); i++)
+	{
+		btlbrs.push_back(btracks[i].tlbr);
+	}
+	dist_size = atracks.size();
+	dist_size_size = btracks.size();
+	vector<vector<float> > _ious = ious(atlbrs, btlbrs);
+	for (int i = 0; i < _ious.size();i++)
+	{
+		vector<float> _iou;
+		for (int j = 0; j < _ious[i].size(); j++)
+		{
+			_iou.push_back(1 - _ious[i][j]);
+		}
+		cost_matrix.push_back(_iou);
+	}
+	return cost_matrix;
+}
+vector<vector<float> > BYTETracker::iou_distance(vector<STrack> &atracks, vector<STrack> &btracks)
+{
+	vector<vector<float> > atlbrs, btlbrs;
+	for (int i = 0; i < atracks.size(); i++)
+	{
+		atlbrs.push_back(atracks[i].tlbr);
+	}
+	for (int i = 0; i < btracks.size(); i++)
+	{
+		btlbrs.push_back(btracks[i].tlbr);
+	}
+	vector<vector<float> > _ious = ious(atlbrs, btlbrs);
+	vector<vector<float> > cost_matrix;
+	for (int i = 0; i < _ious.size(); i++)
+	{
+		vector<float> _iou;
+		for (int j = 0; j < _ious[i].size(); j++)
+		{
+			_iou.push_back(1 - _ious[i][j]);
+		}
+		cost_matrix.push_back(_iou);
+	}
+	return cost_matrix;
+}
+double BYTETracker::lapjv(const vector<vector<float> > &cost, vector<int> &rowsol, vector<int> &colsol,
+	bool extend_cost, float cost_limit, bool return_cost)
+{
+	vector<vector<float> > cost_c;
+	cost_c.assign(cost.begin(), cost.end());
+	vector<vector<float> > cost_c_extended;
+	int n_rows = cost.size();
+	int n_cols = cost[0].size();
+	rowsol.resize(n_rows);
+	colsol.resize(n_cols);
+	int n = 0;
+	if (n_rows == n_cols)
+	{
+		n = n_rows;
+	}
+	else
+	{
+		if (!extend_cost)
+		{
+			cout << "set extend_cost=True" << endl;
+			system("pause");
+			exit(0);
+		}
+	}
+	if (extend_cost || cost_limit < LONG_MAX)
+	{
+		n = n_rows + n_cols;
+		cost_c_extended.resize(n);
+		for (int i = 0; i < cost_c_extended.size(); i++)
+			cost_c_extended[i].resize(n);
+		if (cost_limit < LONG_MAX)
+		{
+			for (int i = 0; i < cost_c_extended.size(); i++)
+			{
+				for (int j = 0; j < cost_c_extended[i].size(); j++)
+				{
+					cost_c_extended[i][j] = cost_limit / 2.0;
+				}
+			}
+		}
+		else
+		{
+			float cost_max = -1;
+			for (int i = 0; i < cost_c.size(); i++)
+			{
+				for (int j = 0; j < cost_c[i].size(); j++)
+				{
+					if (cost_c[i][j] > cost_max)
+						cost_max = cost_c[i][j];
+				}
+			}
+			for (int i = 0; i < cost_c_extended.size(); i++)
+			{
+				for (int j = 0; j < cost_c_extended[i].size(); j++)
+				{
+					cost_c_extended[i][j] = cost_max + 1;
+				}
+			}
+		}
+		for (int i = n_rows; i < cost_c_extended.size(); i++)
+		{
+			for (int j = n_cols; j < cost_c_extended[i].size(); j++)
+			{
+				cost_c_extended[i][j] = 0;
+			}
+		}
+		for (int i = 0; i < n_rows; i++)
+		{
+			for (int j = 0; j < n_cols; j++)
+			{
+				cost_c_extended[i][j] = cost_c[i][j];
+			}
+		}
+		cost_c.clear();
+		cost_c.assign(cost_c_extended.begin(), cost_c_extended.end());
+	}
+	double **cost_ptr;
+	cost_ptr = new double *[sizeof(double *) * n];
+	for (int i = 0; i < n; i++)
+		cost_ptr[i] = new double[sizeof(double) * n];
+	for (int i = 0; i < n; i++)
+	{
+		for (int j = 0; j < n; j++)
+		{
+			cost_ptr[i][j] = cost_c[i][j];
+		}
+	}
+	int* x_c = new int[sizeof(int) * n];
+	int *y_c = new int[sizeof(int) * n];
+	int ret = lapjv_internal(n, cost_ptr, x_c, y_c);
+	if (ret != 0)
+	{
+		cout << "Calculate Wrong!" << endl;
+		system("pause");
+		exit(0);
+	}
+	double opt = 0.0;
+	if (n != n_rows)
+	{
+		for (int i = 0; i < n; i++)
+		{
+			if (x_c[i] >= n_cols)
+				x_c[i] = -1;
+			if (y_c[i] >= n_rows)
+				y_c[i] = -1;
+		}
+		for (int i = 0; i < n_rows; i++)
+		{
+			rowsol[i] = x_c[i];
+		}
+		for (int i = 0; i < n_cols; i++)
+		{
+			colsol[i] = y_c[i];
+		}
+		if (return_cost)
+		{
+			for (int i = 0; i < rowsol.size(); i++)
+			{
+				if (rowsol[i] != -1)
+				{
+					//cout << i << "\t" << rowsol[i] << "\t" << cost_ptr[i][rowsol[i]] << endl;
+					opt += cost_ptr[i][rowsol[i]];
+				}
+			}
+		}
+	}
+	else if (return_cost)
+	{
+		for (int i = 0; i < rowsol.size(); i++)
+		{
+			opt += cost_ptr[i][rowsol[i]];
+		}
+	}
+	for (int i = 0; i < n; i++)
+	{
+		delete[]cost_ptr[i];
+	}
+	delete[]cost_ptr;
+	delete[]x_c;
+	delete[]y_c;
+	return opt;
+}
+Scalar BYTETracker::get_color(int idx)
+{
+	idx += 3;
+	return Scalar(37 * idx % 255, 17 * idx % 255, 29 * idx % 255);
+}

deploy/scripts/export_onnx.py ADDED Viewed

	@@ -0,0 +1,102 @@

+from loguru import logger
+import torch
+from torch import nn
+from yolox.exp import get_exp
+from yolox.models.network_blocks import SiLU
+from yolox.utils import replace_module
+import argparse
+import os
+def make_parser():
+    parser = argparse.ArgumentParser("YOLOX onnx deploy")
+    parser.add_argument(
+        "--output-name", type=str, default="ocsort.onnx", help="output name of models"
+    )
+    parser.add_argument(
+        "--input", default="images", type=str, help="input node name of onnx model"
+    )
+    parser.add_argument(
+        "--output", default="output", type=str, help="output node name of onnx model"
+    )
+    parser.add_argument(
+        "-o", "--opset", default=11, type=int, help="onnx opset version"
+    )
+    parser.add_argument("--no-onnxsim", action="store_true", help="use onnxsim or not")
+    parser.add_argument(
+        "-f",
+        "--exp_file",
+        default=None,
+        type=str,
+        help="expriment description file",
+    )
+    parser.add_argument("-expn", "--experiment-name", type=str, default=None)
+    parser.add_argument("-n", "--name", type=str, default=None, help="model name")
+    parser.add_argument("-c", "--ckpt", default=None, type=str, help="ckpt path")
+    parser.add_argument(
+        "opts",
+        help="Modify config options using the command-line",
+        default=None,
+        nargs=argparse.REMAINDER,
+    )
+    return parser
+@logger.catch
+def main():
+    args = make_parser().parse_args()
+    logger.info("args value: {}".format(args))
+    exp = get_exp(args.exp_file, args.name)
+    exp.merge(args.opts)
+    if not args.experiment_name:
+        args.experiment_name = exp.exp_name
+    model = exp.get_model()
+    if args.ckpt is None:
+        file_name = os.path.join(exp.output_dir, args.experiment_name)
+        ckpt_file = os.path.join(file_name, "best_ckpt.pth.tar")
+    else:
+        ckpt_file = args.ckpt
+    # load the model state dict
+    ckpt = torch.load(ckpt_file, map_location="cpu")
+    model.eval()
+    if "model" in ckpt:
+        ckpt = ckpt["model"]
+    model.load_state_dict(ckpt)
+    model = replace_module(model, nn.SiLU, SiLU)
+    model.head.decode_in_inference = False
+    logger.info("loading checkpoint done.")
+    dummy_input = torch.randn(1, 3, exp.test_size[0], exp.test_size[1])
+    torch.onnx._export(
+        model,
+        dummy_input,
+        args.output_name,
+        input_names=[args.input],
+        output_names=[args.output],
+        opset_version=args.opset,
+    )
+    logger.info("generated onnx model named {}".format(args.output_name))
+    if not args.no_onnxsim:
+        import onnx
+        from onnxsim import simplify
+        # use onnxsimplify to reduce reduent model.
+        onnx_model = onnx.load(args.output_name)
+        model_simp, check = simplify(onnx_model)
+        assert check, "Simplified ONNX model could not be validated"
+        onnx.save(model_simp, args.output_name)
+        logger.info("generated simplified onnx model named {}".format(args.output_name))
+if __name__ == "__main__":
+    main()

deploy/scripts/trt.py ADDED Viewed

	@@ -0,0 +1,74 @@

+from loguru import logger
+import tensorrt as trt
+import torch
+from torch2trt import torch2trt
+from yolox.exp import get_exp
+import argparse
+import os
+import shutil
+def make_parser():
+    parser = argparse.ArgumentParser("YOLOX ncnn deploy")
+    parser.add_argument("-expn", "--experiment-name", type=str, default=None)
+    parser.add_argument("-n", "--name", type=str, default=None, help="model name")
+    parser.add_argument(
+        "-f",
+        "--exp_file",
+        default=None,
+        type=str,
+        help="pls input your expriment description file",
+    )
+    parser.add_argument("-c", "--ckpt", default=None, type=str, help="ckpt path")
+    return parser
+@logger.catch
+def main():
+    args = make_parser().parse_args()
+    exp = get_exp(args.exp_file, args.name)
+    if not args.experiment_name:
+        args.experiment_name = exp.exp_name
+    model = exp.get_model()
+    file_name = os.path.join(exp.output_dir, args.experiment_name)
+    os.makedirs(file_name, exist_ok=True)
+    if args.ckpt is None:
+        ckpt_file = os.path.join(file_name, "best_ckpt.pth.tar")
+    else:
+        ckpt_file = args.ckpt
+    ckpt = torch.load(ckpt_file, map_location="cpu")
+    # load the model state dict
+    model.load_state_dict(ckpt["model"])
+    logger.info("loaded checkpoint done.")
+    model.eval()
+    model.cuda()
+    model.head.decode_in_inference = False
+    x = torch.ones(1, 3, exp.test_size[0], exp.test_size[1]).cuda()
+    model_trt = torch2trt(
+        model,
+        [x],
+        fp16_mode=True,
+        log_level=trt.Logger.INFO,
+        max_workspace_size=(1 << 32),
+    )
+    torch.save(model_trt.state_dict(), os.path.join(file_name, "model_trt.pth"))
+    logger.info("Converted TensorRT model done.")
+    engine_file = os.path.join(file_name, "model_trt.engine")
+    engine_file_demo = os.path.join("deploy", "TensorRT", "cpp", "model_trt.engine")
+    with open(engine_file, "wb") as f:
+        f.write(model_trt.engine.serialize())
+    shutil.copyfile(engine_file, engine_file_demo)
+    logger.info("Converted TensorRT model engine file is saved for C++ inference.")
+if __name__ == "__main__":
+    main()

docs/DEPLOY.md ADDED Viewed

	@@ -0,0 +1,38 @@

+# Deployment
+We provide support to some popular deployment tools. This part is built upon the implementation of [YOLOX Deployment](https://github.com/Megvii-BaseDetection/YOLOX/tree/main/demo) and [the adaptation by ByteTrack](https://github.com/ifzhang/ByteTrack/tree/main/deploy).
+## ONNX support
+1. convert the pytorch model to onnx checkpoints, we provide an example here.
+    ```python
+    # In pratice you may want smaller model for faster inference.
+    python deploy/scripts/export_onnx.py --output-name  ocsort.onnx -f exps/example/mot/yolox_x_mix_det.py -c pretrained/bytetrack_x_mot17.pth.tar
+    ```
+2. run on the provided model video by
+    ```shell
+    cd $OCSORT_HOME/deploy/ONNXRuntime
+    python onnx_inference.py
+    ```
+## TensorRT support (Python)
+1. Follow [TensorRT Installation Guide](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html) and [torch2trt](https://github.com/NVIDIA-AI-IOT/torch2trt) to install TensorRT (Version 7 recommended) and torch2trt.
+2. Convert Model
+    ```python
+    # you have to download checkpoint bytetrack_s_mot17.pth.tar from model zoo of ByteTrack
+    python3 deploy/scripts/trt.py -f exps/example/mot/yolox_s_mix_det.py -c pretrained/bytetrack_s_mot17.pth.tar
+    ```
+3. Run on a demo video
+    ```python
+    python3 tools/demo_track.py video -f exps/example/mot/yolox_s_mix_det.py --trt --save_result
+    ```
+*Note: We haven't validated the C++ support for TensorRT yet, please refer to [ByteTrack guidance](https://github.com/ifzhang/ByteTrack/tree/main/deploy/TensorRT/cpp) for adaptation for now.*
+## ncnn support
+Please follow the [guidelines](https://github.com/ifzhang/ByteTrack/tree/main/deploy/ncnn/cpp) from ByteTrack to deploy by support from ncnn.

exps/SU-T-ReID.py ADDED Viewed

	@@ -0,0 +1,162 @@

+# encoding: utf-8
+import os
+import random
+import torch
+import torch.nn as nn
+import torch.distributed as dist
+from yolox.exp import Exp as MyExp
+from yolox.data import get_yolox_datadir
+class Exp(MyExp):
+    def __init__(self):
+        super(Exp, self).__init__()
+        self.num_classes = 1
+        self.depth = 1.33
+        self.width = 1.25
+        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
+        self.train_ann = "train.json"
+        self.val_ann = "test.json"
+        self.input_size = (800, 1440)
+        self.test_size = (800, 1440)
+        self.random_size = (18, 32)
+        self.max_epoch = 80
+        self.print_interval = 20
+        self.eval_interval = 5
+        self.test_conf = 0.1
+        self.nmsthre = 0.7
+        self.no_aug_epochs = 10
+        self.basic_lr_per_img = 0.001 / 64.0
+        self.warmup_epochs = 1
+        # tracking params
+        self.ckpt = "Checkpoint.pth.tar"
+        self.use_byte = True
+        self.dataset = "mft25"
+        self.inertia = 0.05
+        self.iou_thresh = 0.25
+        self.asso = "fishiou"
+        self.TCM_first_step = True
+        self.TCM_byte_step = True
+        self.TCM_first_step_weight = 1.0
+        self.TCM_byte_step_weight = 1.0
+        self.with_reid = True
+        self.with_fastreid =True
+        self.EG_weight_high_score= 1.3
+        self.EG_weight_low_score= 1.2
+        self.fast_reid_config = "fast_reid/configs/SBS_S101.yml"
+        self.fast_reid_weights = "ReID-Checkpoint.pth"
+        self.with_longterm_reid_correction = True
+        self.longterm_reid_correction_thresh = 0.4
+        self.longterm_reid_correction_thresh_low = 0.4
+    def get_data_loader(self, batch_size, is_distributed, no_aug=False):
+        from yolox.data import (
+            MOTDataset,
+            TrainTransform,
+            YoloBatchSampler,
+            DataLoader,
+            InfiniteSampler,
+            MosaicDetection,
+        )
+        dataset = MOTDataset(
+            data_dir=os.path.join(get_yolox_datadir(), "mft25"),
+            json_file=self.train_ann,
+            name='',
+            img_size=self.input_size,
+            preproc=TrainTransform(
+                rgb_means=(0.485, 0.456, 0.406),
+                std=(0.229, 0.224, 0.225),
+                max_labels=500,
+            ),
+        )
+        dataset = MosaicDetection(
+            dataset,
+            mosaic=not no_aug,
+            img_size=self.input_size,
+            preproc=TrainTransform(
+                rgb_means=(0.485, 0.456, 0.406),
+                std=(0.229, 0.224, 0.225),
+                max_labels=1000,
+            ),
+            degrees=self.degrees,
+            translate=self.translate,
+            scale=self.scale,
+            shear=self.shear,
+            perspective=self.perspective,
+            enable_mixup=self.enable_mixup,
+        )
+        self.dataset = dataset
+        if is_distributed:
+            batch_size = batch_size // dist.get_world_size()
+        sampler = InfiniteSampler(
+            len(self.dataset), seed=self.seed if self.seed else 0
+        )
+        batch_sampler = YoloBatchSampler(
+            sampler=sampler,
+            batch_size=batch_size,
+            drop_last=False,
+            input_dimension=self.input_size,
+            mosaic=not no_aug,
+        )
+        dataloader_kwargs = {"num_workers": self.data_num_workers, "pin_memory": True}
+        dataloader_kwargs["batch_sampler"] = batch_sampler
+        train_loader = DataLoader(self.dataset, **dataloader_kwargs)
+        return train_loader
+    def get_eval_loader(self, batch_size, is_distributed, testdev=False, run_tracking=False):   # [hgx0411] dataloader related
+        from yolox.data import MOTDataset, ValTransform
+        valdataset = MOTDataset(
+            data_dir=os.path.join(get_yolox_datadir(), "mft25"),
+            json_file=self.val_ann,
+            img_size=self.test_size,
+            name='test',
+            preproc=ValTransform(
+                rgb_means=(0.485, 0.456, 0.406),
+                std=(0.229, 0.224, 0.225),
+            ),
+            run_tracking=run_tracking
+        )
+        if is_distributed:
+            batch_size = batch_size // dist.get_world_size()
+            sampler = torch.utils.data.distributed.DistributedSampler(
+                valdataset, shuffle=False
+            )
+        else:
+            sampler = torch.utils.data.SequentialSampler(valdataset)
+        dataloader_kwargs = {
+            "num_workers": self.data_num_workers,
+            "pin_memory": True,
+            "sampler": sampler,
+        }
+        dataloader_kwargs["batch_size"] = batch_size
+        val_loader = torch.utils.data.DataLoader(valdataset, **dataloader_kwargs)
+        return val_loader
+    def get_evaluator(self, batch_size, is_distributed, testdev=False):
+        from yolox.evaluators import COCOEvaluator
+        val_loader = self.get_eval_loader(batch_size, is_distributed, testdev=testdev, run_tracking=False)      # [hgx0411] dataloader related
+        evaluator = COCOEvaluator(
+            dataloader=val_loader,
+            img_size=self.test_size,
+            confthre=self.test_conf,
+            nmsthre=self.nmsthre,
+            num_classes=self.num_classes,
+            testdev=testdev,
+        )
+        return evaluator

exps/SU-T.py ADDED Viewed

	@@ -0,0 +1,152 @@

+# encoding: utf-8
+import os
+import random
+import torch
+import torch.nn as nn
+import torch.distributed as dist
+from yolox.exp import Exp as MyExp
+from yolox.data import get_yolox_datadir
+class Exp(MyExp):
+    def __init__(self):
+        super(Exp, self).__init__()
+        self.num_classes = 1
+        self.depth = 1.33
+        self.width = 1.25
+        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
+        self.train_ann = "train.json"
+        self.val_ann = "test.json"
+        self.input_size = (800, 1440)
+        self.test_size = (800, 1440)
+        self.random_size = (18, 32)
+        self.max_epoch = 80
+        self.print_interval = 20
+        self.eval_interval = 5
+        self.test_conf = 0.1
+        self.nmsthre = 0.7
+        self.no_aug_epochs = 10
+        self.basic_lr_per_img = 0.001 / 64.0
+        self.warmup_epochs = 1
+        # tracking params
+        self.ckpt = "Checkpoint.pth.tar"
+        self.use_byte = True
+        self.dataset = "mft25"
+        self.inertia = 0.05
+        self.iou_thresh = 0.25
+        self.asso = "fishiou"
+        self.TCM_first_step = True
+        self.TCM_byte_step = True
+        self.TCM_first_step_weight = 1.0
+        self.TCM_byte_step_weight = 1.0
+        self.with_reid = False
+    def get_data_loader(self, batch_size, is_distributed, no_aug=False):
+        from yolox.data import (
+            MOTDataset,
+            TrainTransform,
+            YoloBatchSampler,
+            DataLoader,
+            InfiniteSampler,
+            MosaicDetection,
+        )
+        dataset = MOTDataset(
+            data_dir=os.path.join(get_yolox_datadir(), "mft25"),
+            json_file=self.train_ann,
+            name='',
+            img_size=self.input_size,
+            preproc=TrainTransform(
+                rgb_means=(0.485, 0.456, 0.406),
+                std=(0.229, 0.224, 0.225),
+                max_labels=500,
+            ),
+        )
+        dataset = MosaicDetection(
+            dataset,
+            mosaic=not no_aug,
+            img_size=self.input_size,
+            preproc=TrainTransform(
+                rgb_means=(0.485, 0.456, 0.406),
+                std=(0.229, 0.224, 0.225),
+                max_labels=1000,
+            ),
+            degrees=self.degrees,
+            translate=self.translate,
+            scale=self.scale,
+            shear=self.shear,
+            perspective=self.perspective,
+            enable_mixup=self.enable_mixup,
+        )
+        self.dataset = dataset
+        if is_distributed:
+            batch_size = batch_size // dist.get_world_size()
+        sampler = InfiniteSampler(
+            len(self.dataset), seed=self.seed if self.seed else 0
+        )
+        batch_sampler = YoloBatchSampler(
+            sampler=sampler,
+            batch_size=batch_size,
+            drop_last=False,
+            input_dimension=self.input_size,
+            mosaic=not no_aug,
+        )
+        dataloader_kwargs = {"num_workers": self.data_num_workers, "pin_memory": True}
+        dataloader_kwargs["batch_sampler"] = batch_sampler
+        train_loader = DataLoader(self.dataset, **dataloader_kwargs)
+        return train_loader
+    def get_eval_loader(self, batch_size, is_distributed, testdev=False, run_tracking=False):   # [hgx0411] dataloader related
+        from yolox.data import MOTDataset, ValTransform
+        valdataset = MOTDataset(
+            data_dir=os.path.join(get_yolox_datadir(), "mft25"),
+            json_file=self.val_ann,
+            img_size=self.test_size,
+            name='test',
+            preproc=ValTransform(
+                rgb_means=(0.485, 0.456, 0.406),
+                std=(0.229, 0.224, 0.225),
+            ),
+            run_tracking=run_tracking
+        )
+        if is_distributed:
+            batch_size = batch_size // dist.get_world_size()
+            sampler = torch.utils.data.distributed.DistributedSampler(
+                valdataset, shuffle=False
+            )
+        else:
+            sampler = torch.utils.data.SequentialSampler(valdataset)
+        dataloader_kwargs = {
+            "num_workers": self.data_num_workers,
+            "pin_memory": True,
+            "sampler": sampler,
+        }
+        dataloader_kwargs["batch_size"] = batch_size
+        val_loader = torch.utils.data.DataLoader(valdataset, **dataloader_kwargs)
+        return val_loader
+    def get_evaluator(self, batch_size, is_distributed, testdev=False):
+        from yolox.evaluators import COCOEvaluator
+        val_loader = self.get_eval_loader(batch_size, is_distributed, testdev=testdev, run_tracking=False)      # [hgx0411] dataloader related
+        evaluator = COCOEvaluator(
+            dataloader=val_loader,
+            img_size=self.test_size,
+            confthre=self.test_conf,
+            nmsthre=self.nmsthre,
+            num_classes=self.num_classes,
+            testdev=testdev,
+        )
+        return evaluator

exps/default/nano.py ADDED Viewed

	@@ -0,0 +1,39 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+# Copyright (c) Megvii, Inc. and its affiliates.
+import os
+import torch.nn as nn
+from yolox.exp import Exp as MyExp
+class Exp(MyExp):
+    def __init__(self):
+        super(Exp, self).__init__()
+        self.depth = 0.33
+        self.width = 0.25
+        self.scale = (0.5, 1.5)
+        self.random_size = (10, 20)
+        self.test_size = (416, 416)
+        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
+        self.enable_mixup = False
+    def get_model(self, sublinear=False):
+        def init_yolo(M):
+            for m in M.modules():
+                if isinstance(m, nn.BatchNorm2d):
+                    m.eps = 1e-3
+                    m.momentum = 0.03
+        if "model" not in self.__dict__:
+            from yolox.models import YOLOX, YOLOPAFPN, YOLOXHead
+            in_channels = [256, 512, 1024]
+            # NANO model use depthwise = True, which is main difference.
+            backbone = YOLOPAFPN(self.depth, self.width, in_channels=in_channels, depthwise=True)
+            head = YOLOXHead(self.num_classes, self.width, in_channels=in_channels, depthwise=True)
+            self.model = YOLOX(backbone, head)
+        self.model.apply(init_yolo)
+        self.model.head.initialize_biases(1e-2)
+        return self.model

exps/default/yolov3.py ADDED Viewed

	@@ -0,0 +1,89 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+# Copyright (c) Megvii, Inc. and its affiliates.
+import os
+import torch
+import torch.nn as nn
+from yolox.exp import Exp as MyExp
+class Exp(MyExp):
+    def __init__(self):
+        super(Exp, self).__init__()
+        self.depth = 1.0
+        self.width = 1.0
+        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
+    def get_model(self, sublinear=False):
+        def init_yolo(M):
+            for m in M.modules():
+                if isinstance(m, nn.BatchNorm2d):
+                    m.eps = 1e-3
+                    m.momentum = 0.03
+        if "model" not in self.__dict__:
+            from yolox.models import YOLOX, YOLOFPN, YOLOXHead
+            backbone = YOLOFPN()
+            head = YOLOXHead(self.num_classes, self.width, in_channels=[128, 256, 512], act="lrelu")
+            self.model = YOLOX(backbone, head)
+        self.model.apply(init_yolo)
+        self.model.head.initialize_biases(1e-2)
+        return self.model
+    def get_data_loader(self, batch_size, is_distributed, no_aug=False):
+        from data.datasets.cocodataset import COCODataset
+        from data.datasets.mosaicdetection import MosaicDetection
+        from data.datasets.data_augment import TrainTransform
+        from data.datasets.dataloading import YoloBatchSampler, DataLoader, InfiniteSampler
+        import torch.distributed as dist
+        dataset = COCODataset(
+                data_dir='data/COCO/',
+                json_file=self.train_ann,
+                img_size=self.input_size,
+                preproc=TrainTransform(
+                    rgb_means=(0.485, 0.456, 0.406),
+                    std=(0.229, 0.224, 0.225),
+                    max_labels=50
+                ),
+        )
+        dataset = MosaicDetection(
+            dataset,
+            mosaic=not no_aug,
+            img_size=self.input_size,
+            preproc=TrainTransform(
+                rgb_means=(0.485, 0.456, 0.406),
+                std=(0.229, 0.224, 0.225),
+                max_labels=120
+            ),
+            degrees=self.degrees,
+            translate=self.translate,
+            scale=self.scale,
+            shear=self.shear,
+            perspective=self.perspective,
+        )
+        self.dataset = dataset
+        if is_distributed:
+            batch_size = batch_size // dist.get_world_size()
+            sampler = InfiniteSampler(len(self.dataset), seed=self.seed if self.seed else 0)
+        else:
+            sampler = torch.utils.data.RandomSampler(self.dataset)
+        batch_sampler = YoloBatchSampler(
+            sampler=sampler,
+            batch_size=batch_size,
+            drop_last=False,
+            input_dimension=self.input_size,
+            mosaic=not no_aug
+        )
+        dataloader_kwargs = {"num_workers": self.data_num_workers, "pin_memory": True}
+        dataloader_kwargs["batch_sampler"] = batch_sampler
+        train_loader = DataLoader(self.dataset, **dataloader_kwargs)
+        return train_loader

exps/default/yolox_l.py ADDED Viewed

	@@ -0,0 +1,15 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+# Copyright (c) Megvii, Inc. and its affiliates.
+import os
+from yolox.exp import Exp as MyExp
+class Exp(MyExp):
+    def __init__(self):
+        super(Exp, self).__init__()
+        self.depth = 1.0
+        self.width = 1.0
+        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]

exps/default/yolox_m.py ADDED Viewed

	@@ -0,0 +1,15 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+# Copyright (c) Megvii, Inc. and its affiliates.
+import os
+from yolox.exp import Exp as MyExp
+class Exp(MyExp):
+    def __init__(self):
+        super(Exp, self).__init__()
+        self.depth = 0.67
+        self.width = 0.75
+        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]

exps/default/yolox_s.py ADDED Viewed

	@@ -0,0 +1,15 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+# Copyright (c) Megvii, Inc. and its affiliates.
+import os
+from yolox.exp import Exp as MyExp
+class Exp(MyExp):
+    def __init__(self):
+        super(Exp, self).__init__()
+        self.depth = 0.33
+        self.width = 0.50
+        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]

exps/default/yolox_tiny.py ADDED Viewed

	@@ -0,0 +1,19 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+# Copyright (c) Megvii, Inc. and its affiliates.
+import os
+from yolox.exp import Exp as MyExp
+class Exp(MyExp):
+    def __init__(self):
+        super(Exp, self).__init__()
+        self.depth = 0.33
+        self.width = 0.375
+        self.scale = (0.5, 1.5)
+        self.random_size = (10, 20)
+        self.test_size = (416, 416)
+        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
+        self.enable_mixup = False

exps/default/yolox_x.py ADDED Viewed

	@@ -0,0 +1,15 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+# Copyright (c) Megvii, Inc. and its affiliates.
+import os
+from yolox.exp import Exp as MyExp
+class Exp(MyExp):
+    def __init__(self):
+        super(Exp, self).__init__()
+        self.depth = 1.33
+        self.width = 1.25
+        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]

fast_reid/CHANGELOG.md ADDED Viewed

	@@ -0,0 +1,39 @@

+# Changelog
+### v1.3
+#### New Features
+- Vision Transformer backbone, see config in `configs/Market1501/bagtricks_vit.yml`
+- Self-Distillation with EMA update
+- Gradient Clip
+#### Improvements
+- Faster dataloader with pre-fetch thread and cuda stream
+- Optimize DDP training speed by removing `find_unused_parameters` in DDP
+### v1.2 (06/04/2021)
+#### New Features
+- Multiple machine training support
+- [RepVGG](https://github.com/DingXiaoH/RepVGG) backbone
+- [Partial FC](projects/FastFace)
+#### Improvements
+- Torch2trt pipeline
+- Decouple linear transforms and softmax
+- config decorator
+### v1.1 (29/01/2021)
+#### New Features
+- NAIC20(reid track) [1-st solution](projects/NAIC20)
+- Multi-teacher Knowledge Distillation
+- TRT network definition APIs in [FastRT](projects/FastRT)
+#### Bug Fixes
+#### Improvements

fast_reid/GETTING_STARTED.md ADDED Viewed

	@@ -0,0 +1,62 @@

+# Getting Started with Fastreid
+## Prepare pretrained model
+If you use backbones supported by fastreid, you do not need to do anything. It will automatically download the pre-train models.
+But if your network is not connected, you can download pre-train models manually and put it in `~/.cache/torch/checkpoints`.
+If you want to use other pre-train models, such as MoCo pre-train, you can download by yourself and set the pre-train model path in `configs/Base-bagtricks.yml`.
+## Compile with cython to accelerate evalution
+```bash
+cd fastreid/evaluation/rank_cylib; make all
+```
+## Training & Evaluation in Command Line
+We provide a script in "tools/train_net.py", that is made to train all the configs provided in fastreid.
+You may want to use it as a reference to write your own training script.
+To train a model with "train_net.py", first setup up the corresponding datasets following [datasets/README.md](https://github.com/JDAI-CV/fast-reid/tree/master/datasets), then run:
+```bash
+python3 tools/train_net.py --config-file ./configs/Market1501/bagtricks_R50.yml MODEL.DEVICE "cuda:0"
+```
+The configs are made for 1-GPU training.
+If you want to train model with 4 GPUs, you can run:
+```bash
+python3 tools/train_net.py --config-file ./configs/Market1501/bagtricks_R50.yml --num-gpus 4
+```
+If you want to train model with multiple machines, you can run:
+```
+# machine 1
+export GLOO_SOCKET_IFNAME=eth0
+export NCCL_SOCKET_IFNAME=eth0
+python3 tools/train_net.py --config-file configs/Market1501/bagtricks_R50.yml \
+--num-gpus 4 --num-machines 2 --machine-rank 0 --dist-url tcp://ip:port
+# machine 2
+export GLOO_SOCKET_IFNAME=eth0
+export NCCL_SOCKET_IFNAME=eth0
+python3 tools/train_net.py --config-file configs/Market1501/bagtricks_R50.yml \
+--num-gpus 4 --num-machines 2 --machine-rank 1 --dist-url tcp://ip:port
+```
+Make sure the dataset path and code are the same in different machines, and machines can communicate with each other.
+To evaluate a model's performance, use
+```bash
+python3 tools/train_net.py --config-file ./configs/Market1501/bagtricks_R50.yml --eval-only \
+MODEL.WEIGHTS /path/to/checkpoint_file MODEL.DEVICE "cuda:0"
+```
+For more options, see `python3 tools/train_net.py -h`.